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ALIGNMENTS 



RESULT 1 
AAE31705 

ID AAE31705 standard; protein; 673 AA. 
XX 

AC AAE31705; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Human ABCG8 protein. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5. 
XX 



OS Homo sapiens. 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000;. 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 



DR N-PSDB; AAD48883. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies . 

XX 

PS Claim 22; Page 81-82; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG8 protein 
XX 

SQ Sequence 673 AA; 



Query Match 100.0%; Score 3506; DB 6; Length 673; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 673; Conservative 0; Mismatches 0; Indels 0; Gaps 0 

Qy 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MAGKAAEERGLPKGAT PQDT S GLQDRL FS SES DNS LYFT YS GQPNTLEVRDLN YQVDLAS 60 

Qy 61 QVPWFEQLAQFKMPWT S P S CQNS CELGI QNLS FKVRS GQMLAI I GS SGCGRAS LLDVI TG 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

Qy 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I II I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 



Qy 181 Q AQ RD KRVE D VI AE L RL RQ C ADT RVGNMYVRGL S GG E RRRVS I GVQL LWN PGILILDEPT 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 181 QAQRD KRVE D VI AE L RL RQ CADTRVGNMYVRGL S GG ERRRVS I GVQ LLWN PGILILDEPT 240 



Qy 

Db 



241 
241 



S GLD S FT AHNLVKT LS RLAKGNRLVL I S LHQ P RS D I FRL FDLVLLMT S GT P I YLGAAQHM 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
SGLDSFT7UINLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPI YLGAAQHM 300 



Qy 301 VQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

I I I M I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 VQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

Qy 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

I I I I II I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

Qy 421 GAEACLMSMTI GFLYFGHGS IQLS FMDTAALLFMIGALI PFNVILDVI SKCYSERAMLYY 480 

I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 421 GAEACLMSMTI GFLYFGHGS IQLS FMDTAALLFMIGALI PFNVILDVI SKCYSERAMLYY 480 

Qy 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLTVNLRPGLQPFLLHFLLVWLWF 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

Qy 541 CCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCF 600 

I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 CCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCF 600 

Qy 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVI GLSGGFMVLYYV 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVI GLSGGFMVLYYV 660 

Qy 661 SLRFIKQKPSQDW 673 

I I I I I I I I I I I I I 
Db 661 SLRFIKQKPSQDW 673 



RESULT 2 


ABP52129 


ID 


ABP52129 standard; protein; 673 AA. 


XX 




AC 


ABP52129; 


XX 




DT 


10-OCT-2002 (first entry) 


XX 




DE 


Homo sapiens ABC transporter ABCG8 protein SEQ ID NO: 81. 


XX 




KW 


ATP-binding cassette transporter; ABC transporter; modulation; D loop; 


KW 


cancer; bacterial infection; fungal infection; protozoal infection; 


KW 


antibacterial; fungicide; protozoacide . 


XX 




OS 


Homo sapiens. 


XX 




PN 


EP1217066-A1. 


XX 




PD 


26-JUN-2002. 


XX 




PF 


21-DEC-2000; 2000EP-00870316 . 


XX 




PR 


21-DEC-2000; 2000EP-00870316 . 


XX 





PA (UYGE-) UNIV GENT. 
XX 

DR WPI; 2002-550404/59. 
XX 

PT Modulating activity of ATP-binding cassette (ABC) transporters by 

PT influencing dimerization of nucleotide binding domains through use of D 

PT loop sequence of an ABC transporter, or its antisense peptide or peptide 

PT mimetic. 

XX 

PS Disclosure; Fig 3; 290pp; English. 
XX 

CC The present invention describes a method (Ml) for modulating the activity 

CC of ATP-binding cassette (ABC) transporters by influencing the 

CC dimerisation of the nucleotide binding domains comprises using: (a) a 

CC polypeptide (polyp) consisting of 5-50 amino acids comprising the D loop 

CC sequence of an ABC transporter (ABP52049 to ABP52091) ; (b) a polyP 

CC consisting of the D loop sequence of an ABC transporter; (c) a peptide 

CC mimetic or antisense peptide of (a) or (b) . ABC transporters have 

CC antibacterial, fungicide and protozoacide activities. (Ml) is useful for 

CC selectively modulating the activity of ABC transporters belonging to the 

CC group of multidrug transporter/P-glycoproteins . Bacterial, fungal or 

CC protozoal ABC transporters are involved in the infection of a mammal or 

CC in the induction of resistance to antibiotics or drugs in a mammal. (Ml) 

CC is useful for preventing, treating or alleviating diseases associated 

CC with functionality of an ABC transporter. ABP52092 to ABP52140 represent 

CC ABC transporter proteins given in the exemplification of the present 

CC invention 

XX 

SQ Sequence 673 AA; 

Query Match 99.9%; Score 3502; DB 5; Length 673; 

Best Local Similarity 99.9%; Pred. No. 0; 

Matches 672; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

Qy 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

Qy 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

Qy 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

Qy 181 QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 QAQ RD KRVEDVI AE L RL RQ CADT RVGNMYVRGL S GGE RRRVS I GVQ L LWN PGILILDEPT 240 

Qy 241 SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

Qy 301 VQYFTAIGYPCPRYSNPADFYVT)LTSIDRRSREQELATREKAQSIAALFLEKVRDLDDFL 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 VQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 



Qy 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

Qy 421 GAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLYY 480 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I II I I I 

Db 421 GAEACLMSMTI GFLYFGHGS IQLS FMDTAALLFMI GAL I PFNVI LDVI SKCYSERAMLYY 480 

Qy 481 ELEDGLYTTGP YFFAKI LGELPEHCAYI 1 1 YGMPT YWLANLRPGLQPFLLHFLLVWLWF 540 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 481 ELEDGLYTTGPYFFAKI LGELPEHCAYI 1 1 YGMPT YWLANLRPGLQPFLLHFLLVWLWF 540 

Qy 541 CCRIMALAAAALLPTFHMASFFSNALYNSFYIAGGF 600 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 541 CCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCF 600 

Qy 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSVMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

Qy 661 SLRFIKQKPSQDW 673 

I I I I I I I I I I I I I 
Db 661 SLRFIKQKPSQDW 673 



RESULT 3 
AAE31703 

ID AAE31703 standard; protein; 672 AA. 
XX 

AC AAE31703; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG8 protein. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 
KW sitosterolaemia; hyperlipidaemia; hypercholesterolemia; gall stone; 
KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 
KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 
KW ABCG5. 
XX 

OS Mus sp . 
XX 

FH Key Location/Qualifiers 
FT Misc-dif f erence 440 

FT /note= "Encoded by AAG" 

XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2 001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 
PR 28-NOV-2000; 2000US-0253645P . 
XX 



PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR N-PSDB; AAD48881. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyper lipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 22; Page 76; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG8 protein 
XX 

SQ Sequence 672 AA; 

Query Match 82.4%; Score 2888.5; DB 6; Length 672; 

Best Local Similarity 81.9%; Pred. No. 1.5e-287; 

Matches 551; Conservative 53; Mismatches 68; Indels 1; Gaps 1; 

Qy 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

Ml || I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I : M 
Db 1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

Qy 61 QVPWFEQLAQFKMPWT S P S CQNS CELGI QNLS FKVRS GQMLAI I GS SGCGRAS LLDVI TG 120 



I I I I I I I I I l l l • I l I I i*iiiiii*iiiiiimi i i i i i i i i i i i i i i i 

61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAI I GS SGCGRAS LLDVITG 120 





Db 



Qy 



121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 



I I I I I I • I I I i I I I I M I I • I I I i i i i i i i I i i i * I Ill i i > i 

121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 




Db 



Qy 



181 QAQRDKRVEDVI AELRLRQCADTRVGNMYVRGLS GGERRRVS I G VQLLWNPGI LI LDEPT 240 



I I I I I I I M I I I I i i i i i i i i • i i i i I 1111*11 t i i i i i t i 

181 QAQRDKRVEDVIAELRLRQCANTRVGNTYWGVS GGERRRVS I GVQLLWNP GIL I LDEPT 240 





Db 



Qy 



241 SGLDSFT7VHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 




Db 



241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGT^AQQM 300 



Qy 



Db 



301 VQYFTAIGYPCPRYSNPADFWDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

I I I I I : I I : I I I I I I I I I I I I I I I I I I I II I : I : I : I I I I I I I I I I I I I I I I : I I I I 
301 VQYFTS I GHPCPRYSNPADFYVDLTS I DRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 



Qy 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

I I I I I : I : I I I I : I : :: I I :: I I : I I I I I II I I I I I I II I I I I I 

Db 361 WKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 



Qy 421 GAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFTytlGALIPraVILDVISKCYSERAMLYY 480 

I : I I I I I I : I I I I I : I I I :: I I I I I I I I I I I I I I I I I I I I I II I I I : I I !: I I I : I I I I 
Db 420 GSEACLMSLIIGFLYYGHGALQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYY 47 9 

Qy 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I II III I I I I : I I I I I I 

Db 480 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 539 

Qy 541 CCRIMALAAAALLPTFHMAS FFSNALYNSFYLAGGFMINLS SLWTVPAWI SKVSFLRWCF 600 

III I I I I I : I : I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I II : I I I I I I I 
Db 540 CCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCF 599 

Qy 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

I | | : | M : I : I I I : : II : : I II : I : I : I I I I I I I I I I I : I I I : III: 
Db 600 SGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYL 659 

Qy 661 SLRFIKQKPSQDW 673 

11:1111 III 
Db 660 SLKLIKQKSIQDW 672 



RESULT 4 


ABG61539 


ID 


ABG61539 standard; protein; 374 AA. 


XX 




AC 


ABG61539; 


XX 




DT 


27-AUG-2002 (first entry) 


XX 




DE 


Human transporter and ion channel, TRICH9, Incyte ID 6585710CD1. 


XX 




KW 


Human; transporter and ion channel; TRICH; transport disorder; 


KW 


neurological disorder; muscle disorder; immunological disorder; cancer; 


KW 


scleroderma; systemic lupus erythematosus; allergy; leukaemia; 


KW 


cell proliferative disorder; cervical cancer; breast cancer; 


KW 


neurodegenerative disorder; Parkinson's disease; Alzheimer's disease; 


KW 


myotonic dystrophy; catatonia; endocrine disorder; diabetes; 


KW 


Grave's disease; gastrointestinal disorder; Crohn's disease; 


KW 


renal disorder; Goodpasture's syndrome; viral infection; cirrhosis; 


KW 


bacterial infection; fungal infection; parasitic infection; 


KW 


protozoal infection; helminthic infection; cardiovascular disorder; 


KW 


atherosclerosis; hepatic disease. 


XX 




OS 


Homo sapiens. 


XX 




PN 


WO200240541-A2. 


XX 




PD 


23-MAY-2002. 


XX 




PF 


25-OCT-2001; 2001WO-US046055 . 


XX 




PR 


27-OCT-2000; 2000US-0243989P . 


PR 


03-NOV-2000; 2000US-0245904P . 


PR 


09-NOV-2000; 2000US-0247673P . 


PR 


17-NOV-2000; 2000US-0249661P . 


PR 


20-NOV-2000; 2000US-0252232P . 



PR 01-DEC-2000; 2000US-02 507 90P . 
XX 

PA (INCY-) INCYTE GENOMICS INC. 
XX 

PI Tang YT, Yue H, Nguyen DB, Hafalia AJA, Elliott VS, Lu Y; 

PI Walia NK, Yao MG, Baughn MR, Gandhi AR, Ding L, Sanjanwala M; 

PI Ramkumar J, Arvizu C, Gietzen KJ, Lai PG, Azimzai Y, Khan FA; 

PI Thangavelu K, Thornton M, Lu DAM, Tribouley CM, Warren BA, Ison CH; 

PI Das D, Raumann BE, Policky JL, Kearney L; 

XX 

DR WPI; 2002-463570/49. 

DR N-PSDB; ABK83218. 
XX 

PT New transporters and ion channels (TRICH) polypeptides, useful for 

PT diagnosing, preventing, and treating disorders associated with an 

PT abnormal expression or activity of TRICH, e.g. immunological, muscular or 

PT renal disorders. 

XX 

PS Claim 1; Page 143-144; 178pp; English. 
XX 

CC The invention relates to human transporters and ion channels (TRICH) 

CC polypeptides, a naturally occurring amino acid sequence 90 % identical to 

CC TRICH/ a biologically active fragment of TRICH or an immunogenic fragment 

CC of TRICH. Also included are an isolated polynucleotide encoding TRICH, a 

CC recombinant polynucleotide comprising a promoter sequence operably linked 

CC to the TRICH polynucleotide, a cell transformed with the recombinant 

CC polynucleotide, a transgenic organism comprising the recombinant 

CC polynucleotide, an isolated antibody that binds specifically to TRICH, 

CC and screening for compounds which bind to TRICH, modulate TRICH, modulate 

CC TRICH expression or are ant/agonists of TRICH. The polypeptides are 

CC useful for diagnosing, treating, and preventing transport, neurological, 

CC muscle, immunological disorders (e.g. scleroderma, systemic lupus 

CC erythematosus, allergies), cell proliferative disorders such as cancers 

CC (e.g. leukaemia, cervical or breast cancers), neurodegenerative disorders 

CC (e.g. Parkinson's disease, Alzheimer's disease), muscular disorders (e.g. 

CC myotonic dystrophy, catatonia), endocrine disorders (e.g. diabetes, 

CC Grave's disease), gastrointestinal disorders (e.g. Crohn's disease), 

CC renal disorders (e.g. Good pasture's syndrome), viral, bacterial, fungal, 

CC parasitic, protozoal and helminthic infections, cardiovascular disorders 

CC (e.g. atherosclerosis), or hepatic diseases (e.g. cirrhosis) and many 

CC other diseases and disorders detailed in the specification. They can also 

CC be used in assessing the effects of exogenous compounds on the expression 

CC of nucleic acid and amino acid sequences of transporters and ion 

CC channels. TRICH or its fragments may also be used in screening for 

CC compounds that specifically bind to and modulate the activity of TRICH. 

CC The polynucleotides can be used to create knock-in humanised animals or 

CC transgenic animals to model human disease. The present sequence 

CC represents a TRICH protein 

XX 

SQ Sequence 374 AA; 

Query Match 55.9%; Score 1961; DB 5; Length 374; 

Best Local Similarity 99.7%; Pred. No. 1.8e-192; 

Matches 373; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 



Qy 



300 MVQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDF 359 
II | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I II I I I I I I I M I I 



Db 



1 MVHYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDF 60 



Qy 


360 


Db 


61 


Qy 


420 


Db 


121 


Qy 


480 


Db 


181 


Qy 


540 


Db 


241 


Qy 


600 


Db 


301 


Qy 


660 


Db 


361 



LWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 419 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I M I I I 

LWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 12 0 

HGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLY 47 9 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
HGAEACLMSMTIGFLYFGHGSIQLSFMDT7VALLFMIGALIPFNVILDVISKCYSERAMLY 180 

YELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLW 539 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I 
YELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLW 24 0 

FCCRIMALA7WVLLPTFHMASFFSN7VLYNSFYIJ\.GGFMINLSSLWTVPAWISKVSFLRWC 599 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FCCRIMALAAAALLPT FHMAS FFSNALYNS FYLAGGFMINLS S LWTVPAWI S KVS FLRWC 300 

FEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYY 659 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
FEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYY 360 

VSLRFIKQKPSQDW 673 

I I I I I I I I I I I I I I 
VSLRFIKQKPSQDW 374 



RESULT 5 
AAG18079 
ID 
XX 



AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
KW 
XX 
OS 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 



AAG18079 standard; protein; 632 AA. 
AAG18079; 

17-OCT-2000 (first entry) 

Arabidopsis thaliana protein fragment SEQ ID NO: 19344. 

Protein identification; signal transduction pathway; metabolic pathway; 
hybridisation assay; genetic mapping; gene expression control; promoter; 
termination sequence. 

Arabidopsis thaliana. 

EP1033405-A2. 



06-SEP-2000. 



25-FEB-2000; 2000EP-00301439 . 



25-FEB-1999 

05- MAR-1999 
09-MAR-1999 
23-MAR-1999 
25-MAR-1999 
29-MAR-1999 
01-APR-1999 

06- APR-1999 



99US- 
99US 
99US- 
99US 



-0121825P. 
-0123180P. 
-0123548P. 
-0125788P. 



99US-0126264P. 
99US-0126785P. 
99US-0127462P. 
99US-0128234P. 



PR 


08- 


-APR- 


1999, 


PR 


16- 


-APR- 


1999, 


PR 


19- 


-APR- 


1999, 


PR 


21- 


-APR- 


1999, 


PR 


23- 


-APR- 


1999, 


PR 


23- 


-APR- 


1999, 


PR 


28- 


-APR- 


1999, 


PR 


30- 


-APR- 


1999, 


PR 


30- 


-APR- 


1999, 


PR 


04- 


-MAY~ 


1999, 


PR 


05- 


-MAY- 


1999, 


PR 


06- 


-MAY- 


1999, 


PR 


06- 


-MAY- 


1999, 


PR 


07- 


-MAY- 


1999, 


PR 


11- 


-MAY- 


1999, 


PR 


14- 


-MAY- 


1999, 


PR 


14- 


-MAY- 


1999 


PR 


14- 


-MAY- 


1999 


PR 


14- 


-MAY- 


1999 


PR 


18- 


-MAY- 


1999 


PR 


19- 


-MAY- 


1999 


PR 


20- 


-MAY- 


1999 


PR 


21- 


-MAY- 


1999 


PR 


24- 


-MAY- 


1999 


PR 


25- 


-MAY- 


1999 


PR 


27- 


-MAY- 


1999 


PR 


28- 


-MAY- 


1999 


PR 


01- 


- JUN- 


1999 


PR 


03- 


-JUN- 


1999 


PR 


04- 


-JUN- 


1999 


PR 


07- 


-JUN- 


1999 


PR 


08- 


-JUN- 


1999 


PR 


10- 


-JUN- 


1999 


PR 


10- 


- JUN- 


1999 


PR 


14- 


- JUN- 


1999 


PR 


16- 


-JUN- 


1999 


PR 


16 


-JUN- 


-1999 


PR 


17 


-JUN- 


-1999 


PR 


18 


- JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


18 


-JUN- 


-1999 


PR 


21 


-JUN- 


-1999 


PR 


22 


- JUN- 


-1999 


PR 


23 


-JUN- 


-1999 


PR 


23 


-JUN- 


-1999 


PR 


24 


-JUN- 


-1999 


PR 


28 


-JUN- 


-1999 


PR 


29 


-JUN- 


-1999 



99US-0128714P. 

99US-0129845P. 

99US-0130077P. 

99US-0130449P. 

99US-0130510P. 

99US-0130891P. 

99US-0131449P. 

99US-0132048P. 

99US-0132407P. 

99US-0132484P. 

99US-0132485P. 

99US-0132486P. 

99US-0132487P. 

99US-0132863P. 

99US-0134256P. 

99US-0134218P. 

99US-0134219P. 

99US-0134221P. 

99US-0134370P. 

99US-0134768P. 

99US-0134941P. 

99US-0135124P. 

99US-0135353P. 

99US-0135629P. 

99US-0136021P. 

99US-0136392P. 

99US-0136782P. 

99US-0137222P. 

99US-0137528P. 

99US-0137502P. 

99US-0137724P, 

99US-0138094P 

99US-0138540P 

99US-0138847P 

99US-0139119P 

99US-0139452P 

99US-0139453P 

99US-0139492P 

99US-0139454P 

99US-0139455P 

99US-0139456P 

99US-0139457P 

99US-0139458P 

99US-0139459P 

99US-0139460P 

99US-0139461P 

99US-0139462P 

99US-0139463P 

99US-0139750P 

99US-0139763P 

99US-0139817P 

99US-0139899P 

99US-0140353P 

99US-0140354P 

99US-0140695P 

99US-0140823P 

99US-0140991P 



PR 30-JUN-1999; 99US-0141287P . 

PR 01-JUL-1999; 99US-0141842P . 

PR 01-JUL-1999; 99US-0142154P . 

PR 02-JUL-1999; 99US-0142055P . 

PR 06-JUL-1999; 99US-0142390P . 

PR 08-JUL-1999; 99US-0142803P . 

PR 09-JUL-1999; 99US-0142920P . 

PR 12-JUL-1999; 99US-0142977P . 

PR 13-JUL-1999; 99US-0143542P . 

PR 14-JUL-1999; 99US-0143624P . 

PR 15-JUL-1999; 99US-0144005P . 

PR 16-JUL-1999; 99US-0144085P . 

PR 16-JUL-1999; 99US-014408 6P . 

PR 19-JUL-1999; 99US-0144325P . 

PR 19-JUL-1999; 99US-0144331P . 

PR 19-JUL-1999; 99US-0144332P . 

PR 19-JUL-1999; 99US-0144333P . 

PR 19-JUL-1999; 99US-0144334P . 

PR 19-JUL-1999; 99US-0144335P . 

PR 20-JUL-1999; 99US-0144352P . 

PR 20-JUL-1999; 99US-0144632P . 

PR 20-JUL-1999; 99US-0144884P . 

PR 21-JUL-1999; 99US-0144814P . 

PR 21-JUL-1999; 99US-0145086P . 

PR 21-JUL-1999; 99US-0145088P . 

PR 22-JUL-1999; 99US-0145085P . 

PR 22-JUL-1999; 99US-0145087P . 

PR 22-JUL-1999; 99US-0145089P . 

PR 22-JUL-1999; 99US-0145192P . 

PR 23-JUL-1999; 99US-0145145P . 

PR 23-JUL-1999; 99US-0145218P . 

PR 23-JUL-1999; 99US-0145224P . 

PR 26-JUL-1999; 99US-0145276P . 

PR 27-JUL-1999; 99US-0145913P . 

PR 27-JUL-1999; 99US-0145918P . 

PR 27-JUL-1999; 99US-0145919P . 

PR 28-JUL-1999; 99US-0145951P . 

PR 02-AUG-1999; 99US-0146386P . 

PR 02-AUG-1999; 99US-0146388P . 

PR 02-AUG-1999; 99US-0146389P . 

PR 03-AUG-1999; 99US-0147038P. 

PR 04-AUG-1999; 99US-0147204P 

PR 04-AUG-1999; 99US-0147302P 

PR 05-AUG-1999; 99US-0147192P 

PR 05-AUG-1999; 99US-0147260P 

PR 06-AUG-1999; 99US-0147303P 

PR 06-AUG-1999; 99US-0147416P 

PR 09-AUG-1999; 99US-0147493P 

PR 09-AUG-1999; 99US-0147935P 

PR 10-AUG-1999; 99US-014 8171P 

PR ll-AUG-1999; 99US-0148319P 

PR 12-AUG-1999; 99US-0148341P 

PR 13-AUG-1999; 99US-014 8565P 

PR 13-AUG-1999; 99US-014 8684P 

PR 16-AUG-1999; 99US-0149368P 

PR 17-AUG-1999; 99US-0149175P 

PR 18-AUG-1999; 99US-014 9426P 



PR 


20- 


AUG- 


1999; 


99US- 


0149722P. 


PR 


20- 


AUG- 


1999; 


99US- 


0149723P. 


PR 


20- 


AUG- 


1999; 


99US- 


0149929P. 


PR 


23- 


AUG- 


1999; 


99US- 


0149902P. 


PR 


23- 


AUG- 


1999; 


99US- 


0149930P. 


PR 


25- 


AUG- 


1999; 


99US- 


0150566P. 


PR 


26- 


-AUG- 


1999; 


99US- 


0150884P. 


PR 


27- 


-AUG- 


1999; 


99US- 


0151065P. 


PR 


27- 


-AUG- 


1999; 


99US- 


0151066P. 


PR 


27- 


-AUG- 


1999; 


99US- 


0151080P. 


PR 


30- 


-AUG- 


1999; 


99US- 


0151303P. 


PR 


31- 


-AUG- 


1999; 


99US- 


0151438P. 


PR 


01- 


-SEP- 


1999; 


99US- 


0151930P. 


PR 


07- 


-SEP- 


1999; 


99US- 


0152363P. 


PR 


10- 


-SEP- 


1999; 


99US- 


0153070P. 


PR 


13- 


-SEP- 


1999; 


99US- 


0153758P. 


PR 


15- 


-SEP- 


1999; 


99US- 


0154018P. 


PR 


16- 


-SEP- 


1999; 


99US- 


0154039P. 


PR 


20- 


-SEP- 


1999; 


99US- 


0154779P. 


PR 


22- 


-SEP- 


1999 ; 


99US- 


0155139P. 


PR 


23- 


-SEP- 


1999; 


99US- 


-0155486P. 


PR 


24- 


-SEP- 


1999; 


99US- 


-0155659P. 


PR 


28- 


-SEP- 


1999; 


99US- 


-0156458P. 


PR 


29- 


-SEP- 


-1999; 


99US- 


-0156596P. 


PR 


04- 


-OCT- 


-1999; 


99US- 


-0157117P. 


PR 


05- 


-OCT- 


-1999; 


99US- 


-0157753P. 


PR 


06- 


-OCT- 


-1999, 


99US- 


-0157865P. 


PR 


07- 


-OCT- 


-1999, 


99US- 


-0158029P. 


PR 


08- 


-OCT- 


-1999, 


99US- 


-0158232P. 


PR 


12- 


-OCT- 


-1999, 


99US- 


-0158369P. 


PR 


13- 


-OCT- 


-1999, 


99US- 


-0159293P. 


PR 


13- 


-OCT- 


-1999, 


99US- 


-0159294P. 


PR 


13- 


-OCT- 


-1999, 


99US- 


-0159295P. 


PR 


14- 


-OCT- 


-1999, 


99US- 


-0159329P. 


PR 


14- 


-OCT- 


-1999, 


99US- 


-0159330P. 


PR 


14- 


-OCT- 


-1999 


; 99US- 


-0159331P. 


PR 


14- 


-OCT- 


-1999 


; 99US- 


-0159637P. 


PR 


14- 


-OCT- 


-1999 


; 99US- 


-0159638P. 


PR 


18- 


-OCT- 


-1999 


; 99US- 


-0159584P. 


PR 


21 


-OCT- 


-1999 


; 99US- 


-0160741P. 


PR 


21- 


-OCT- 


-1999 


; 99US- 


-0160767P. 


PR 


21- 


-OCT- 


-1999 


; 99US- 


-0160768P. 


PR 


21 


-OCT- 


-1999 


99US- 


-0160770P. 


PR 


21 


-OCT- 


-1999 


? 99US- 


-0160814P. 


PR 


21 


-OCT- 


-1999 


99US- 


-0160815P. 


PR 


22 


-OCT- 


-1999 


99US- 


-0160980P. 


PR 


22 


-OCT- 


-1999 


; 99US- 


-0160981P. 


PR 


22 


-OCT- 


-1999 


99US- 


-0160989P. 


PR 


25 


-OCT- 


-1999 


99US- 


-0161404P. 


PR 


25 


-OCT- 


-1999 


; 99US- 


-0161405P. 


PR 


25 


-OCT- 


-1999 


99US- 


-0161406P. 


PR 


26 


-OCT- 


-1999 


99US- 


-0161359P. 


PR 


26 


-OCT- 


-1999 


; 99US- 


-0161360P. 


PR 


26 


-OCT- 


-1999 


99US- 


-0161361P. 


PR 


28 


-OCT- 


-1999 


99US- 


-0161920P. 


PR 


28 


-OCT 


-1999 


99US 


-0161992P. 


PR 


28 


-OCT 


-1999 


99US 


-0161993P. 



PR 29-OCT-1999; 99US-0162142P . 

Query Match 20.8%; Score 730.5; DB 3; Length 632; 

Best Local Similarity 30.7%; Pred. No. 2.1e-65; 

Matches 211; Conservative 117; Mismatches 269; Indels 91; Gaps 19; 

Qy 9 RGLPKGATPQDTSGLQDRLFSSES — DNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFE 66 

: I I I : I I I : I : I : I I I I : : : I : I : I 
Db 3 QGLPDMSDTQSKSVLAFPTITSQPGLQMSMY PITLKFEEWYKVKI — E 4 9 

Qy 67 QLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGK 126 

| : | || : : : : I I : I I : : I I I I : : I I : I I I 

Db 50 QTSQCMGSWKSKE — KTILNGITGMVCPGEFLAMLGPSGSGKTTLLSALGGR — LSK 102 

Qy 127 IKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDK 186 

II:: I I I I I : I : | | : I I : I I I I I I I I : I I I : : : : : : 

Db 103 TFSGKVMYNGQPFSGCIKRR-TGFVAQDDVLYPHLTVWETLFFTALLRLPSSLTRDEKAE 161 

Qy 187 RVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSF 246 

| : | | | | | | : I : : : I I I : I I I I : : I II I I : : I II : I : I I I I I I II I I 

Db 162 HVDRVIAELGLNRCTNSMIGGPLFRGISGGEKKRVSIGQEMLINPSLLLLDEPTSGLDST 221 

Qy 247 TAHNLWTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTA 306 

Ml : I I : III I I I : : : I I I I I : : II I : I :: I : I I I I I I I : I I : : 
Db 222 TAHRIWTIKRLASGGRTVOTTIHQPSSRIYHMFDKVVLLSEGSPIYYGAASSAVEYFSS 281 

Qy 307 IGYPCPRYSNPADFYVDLTS IDRRSREQELATREKAQSLAALFLEKVRDLDDFLW 361 

: \ : | I II : I I : : : I I I I : : : I : : : : 
Db 282 LGFSTSLTVNPADLLLDLANGIPPDTQKETSEQEQKTVK — ETLVSAYEKNI 331 

Qy 362 KAETKDLDEDTCVESS VTPLDTNCLPSPTKMPGAVQQFTTLIRRQI-SNDFRDLPT 416 

111:11 I II III l::l : I 

Db 332 --STK-LKAELCNAESHSYEYTKAAAKNLKSEQWCTTWWYQFTVLLQRGVRERRFESFNK 388 

Qy 417 LLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERA 476 

|| : : I I : I : : I II I I I : : : I : 

Db 389 LRIF QVISVAFLGGLLWWH-TPKSHIQDRTALLFFFSVFWGFYPLYNAVFTFPQEKR 444 

Qy 477 MLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVW 536 

II I I : I I I I : : I : I I I : II: hi hi hi 

Db 445 MLIKERSSGMYRLSSYFMARNVGDLPLELALPTAFVFIIYWMGGLKPDPTTFILSLLVVL 504 

Qy 537 LWFCCRIMALAAAALLPT FHMAS FFSNALYNS FYLAGGFMINLS S LWTVP AWISKV 593 

| : : I I I I I h : : I : M I : : : I h : 

Db 505 YS VLVAQGLGLAFGALLMN I KQATT LAS VTT LVFL I AGGY YVQ QIPPFIVWLKYL 559 

Qy 594 SFLRWCFEGLMKIQFSRRTY KMPLGNLT I AVS GDKI LS AMEL 635 

|: :h: h lh: I I I II I I : : I 

Db 560 SYSYYCYKLLLGIQYTDDDYYECSKGVWCRVGDFPAIKSMGLNNLWI DVFVMGVML 615 

Qy 636 DSYPLYAIYLIVIGLSGGFMVLYYVSLR 663 

III : I I : I I I 

Db 616 VGYRLMA YMALHRVKLR 632 



RESULT 6 
AAG18078 



ID 

XX 

AC 

XX 

DT 

XX 

DE 

XX 

KW 

KW 

KW 

XX 

OS 

XX 

PN 

XX 

PD 

XX 

PF 

XX 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 

PR 



AAG18078 standard; protein; 648 AA. 
AAG18078; 

17-OCT-2000 (first entry) 

Arabidopsis thaliana protein fragment SEQ ID NO: 19343. 

Protein identification; signal transduction pathway; metabolic pathway; 
hybridisation assay; genetic mapping; gene expression control; promoter; 
termination sequence. 

Arabidopsis thaliana. 

EP1033405-A2 . 

06-SEP-2000. 

25-FEB-2000; 2000EP-00301439 . 



25-FEB- 

05- MAR- 
09-MAR- 
23-MAR- 
25-MAR- 

29- MAR- 
01-APR- 

06- APR- 
08-APR- 
16-APR- 
19-APR- 
21-APR- 
23-APR- 
23-APR- 
28-APR- 

30- APR- 
30-APR- 

04- MAY- 

05- MAY- 

06- MAY- 

06- MAY- 

07- MAY- 
11-MAY- 
14-MAY- 
14-MAY- 
14-MAY- 
14 -MAY- 

18- MAY- 

19- MAY- 
2 0-MAY- 
21-MAY- 
2 4 -MAY- 
2 5 -MAY- 
2 7 -MAY- 
2 8 -MAY 
01-JUN 
03-JUN 



1999 
■1999 

1999 

1999 
•1999 

1999 
•1999 
-1999 
•1999 
-1999 
-1999 
-1999 
-1999 

1999 
-1999 
-1999 

1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 
-1999 



99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US- 
99US 
99US 
99US 
99US- 
99US 
99US 
99US 
99US 
99US 
99US 
99US 



-0121825P. 
-0123180P. 
-0123548P. 
-0125788P. 
-0126264P. 
-0126785P. 
-0127462P. 
-0128234P. 
-0128714P. 
-0129845P. 
-0130077P. 
-0130449P. 
-0130510P. 
-0130891P. 
-0131449P. 
-0132048P. 
-0132407P. 
-0132484P. 
-0132485P. 
-0132486P. 
-0132487P. 
-0132863P. 
-0134256P. 
-0134218P. 
-0134219P. 
-0134221P. 
-0134370P. 
-0134768P. 
-0134941P. 
-0135124P. 
-0135353P. 
-0135629P. 
-0136021P. 
-0136392P. 
-0136782P. 
-0137222P. 
-0137528P. 



PR 


04- 


JUN- 


1999; 


PR 


07- 


JUN- 


1999; 


PR 


08- 


JUN- 


1999; 


PR 


10- 


JUN- 


1999; 


PR 


10- 


JUN- 


1999; 


PR 


14- 


JUN- 


1999; 


PR 


16- 


JUN- 


1999; 


PR 


16- 


JUN- 


1999; 


PR 


17- 


JUN- 


1999; 


PR 


18- 


JUN- 


1999; 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


18- 


JUN- 


1999, 


PR 


21- 


JUN- 


1999, 


PR 


22- 


JUN- 


1999, 


PR 


23- 


JUN- 


1999, 


PR 


23- 


JUN- 


1999, 


PR 


24- 


JUN- 


1999 


PR 


28- 


-JUN- 


1999 


PR 


29- 


JUN- 


1999 


PR 


30- 


JUN- 


1999 


PR 


01- 


■ JUL- 


1999 


PR 


01- 


- JUL- 


1999 


PR 


02- 


- JUL- 


1999 


PR 


06- 


- JUL- 


-1999 


PR 


08- 


- JUL- 


1999 


PR 


09- 


-JUL- 


•1999 


PR 


12- 


-JUL- 


-1999 


PR 


13- 


-JUL- 


-1999 


PR 


14- 


-JUL- 


-1999 


PR 


15- 


-JUL- 


-1999 


PR 


16- 


-JUL- 


-1999 


PR 


16- 


-JUL- 


-1999 


PR 


19- 


-JUL- 


-1999 


PR 


19- 


-JUL- 


-1999 


PR 


19- 


-JUL- 


-1999 


PR 


19- 


-JUL- 


-1999 


PR 


19- 


-JUL- 


-1999 


PR 


19- 


-JUL- 


-1999 


PR 


20- 


-JUL- 


-1999 


PR 


20- 


-JUL- 


-1999 


PR 


20- 


-JUL- 


-1999 


PR 


21- 


-JUL- 


-1999 


PR 


21- 


- JUL- 


-1999 


PR 


21- 


-JUL- 


-1999 


PR 


22- 


-JUL- 


-1999 


PR 


22- 


-JUL- 


-1999 


PR 


22- 


-JUL- 


-1999 


PR 


22- 


-JUL- 


-1999 



99US-0137502P. 

99US-0137724P. 

99US-0138094P. 

99US-0138540P. 

99US-0138847P. 

99US-0139119P. 

99US-0139452P. 

99US-0139453P. 

99US-0139492P. 

99US-0139454P. 

99US-0139455P. 

99US-0139456P. 

99US-0139457P. 

99US-0139458P. 

99US-0139459P. 

99US-0139460P. 

99US-0139461P. 

99US-0139462P. 

99US-0139463P. 

99US-0139750P. 

99US-0139763P. 

99US-0139817P. 

99US-0139899P. 

99US-0140353P. 

99US-0140354P. 

99US-0140695P. 

99US-014O823P. 

99US-0140991P. 

99US-0141287P. 

99US-0141842P. 

99US-0142154P. 

99US-0142055P. 

99US-0142390P. 

99US-0142803P. 

99US-0142920P. 

99US-0142977P. 

99US-0143542P. 

99US-0143624P. 

99US-0144005P. 

99US-0144085P. 

99US-0144086P. 

99US-0144325P. 

99US-0144331P. 

99US-0144332P. 

99US-0144333P. 

99US-0144334P. 

99US-0144335P. 

99US-0144352P. 

99US-0144632P. 

99US-0144884P. 

99US-0144814P. 

99US-0145086P. 

99US-0145088P. 

99US-0145085P. 

99US-0145087P 

99US-0145089P 

99US-0145192P 



PR 23-JUL-1999; 99US-0145145P . 

PR 23-JUL-1999; 99US-0145218P . 

PR 23-JUL-1999; 99US-0145224P . 

PR 26-JUL-1999; 99US-0145276P . 

PR 27-JUL-1999; 99US-0145913P . 

PR 27-JUL-1999; 99US-0145918P . 

PR 27-JUL-1999; 99US-0145919P . 

PR 28-JUL-1999; 99US-0145951P . 

PR 02-AUG-1999; 99US-0146386P . 

PR 02-AUG-1999; 99US-0146388P . 

PR 02-AUG-1999; 99US-014 638 9P . 

PR 03-AUG-1999; 99US-0147 038P . 

PR 04-AUG-1999; 99US-0147204P . 

PR 04-AUG-1999; 99US-0147302P . 

PR 05-AUG-1999; 99US-0147192P . 

PR 05-AUG-1999; 99US-0147260P . 

PR 06-AUG-1999; 99US-0147303P . 

PR 06-AUG-1999; 99US-0147416P . 

PR 09-AUG-1999; 99US-01474 93P . 

PR 09-AUG-1999; 99US-0147935P . 

PR 10-AUG-1999; 99US-0148 17 IP . 

PR ll-AUG-1999; 99US-0148319P . 

PR 12-AUG-1999; 99US-0148341P . 

PR 13-AUG-1999; 99US-014 8565P . 

PR 13-AUG-1999; 99US-0148684P . 

PR 16-AUG-1999; 99US-014 9368P . 

PR 17-AUG-1999; 99US-0149175P . 

PR 18-AUG-1999; 99US-0149426P . 

PR 20-AUG-1999; 99US-0149722P . 

PR 20-AUG-1999; 99US-014 9723P . 

PR 20-AUG-1999; 99US-0149929P . 

PR 23-AUG-1999; 99US-014 9902P . 

PR 23-AUG-1999; 99US-0149930P . 

PR 25-AUG-1999; 99US-0150566P . 

PR 26-AUG-1999; 99US-01508 84P . 

PR 27-AUG-1999; 99US-0151065P . 

PR 27-AUG-1999; 99US-0151066P . 

PR 27-AUG-1999; 99US-0151080P . 

PR 30-AUG-1999; 99US-0151303P . 

PR 31-AUG-1999; 99US-01514 38P . 

PR 01-SEP-1999; 99US-0151930P . 

PR 07-SEP-1999; 99US-0152363P . 

PR 10-SEP-1999; 99US- 015307 OP . 

PR 13-SEP-1999; 99US-0153758P . 

PR 15-SEP-1999; 99US-0154018P . 

PR 16-SEP-1999; 99US-0154039P . 

PR 20-SEP-1999; 99US-0154779P . 

PR 22-SEP-1999; 99US-0155139P . 

PR 23-SEP-1999; 99US-0155486P . 

PR 24-SEP-1999; 99US-0155659P . 

PR 28-SEP-1999; 99US-0156458P . 

PR 29-SEP-1999; 99US-0156596P . 

PR 04-OCT-1999; 99US-0157117P . 

PR 05-OCT-1999; 99US-0157753P . 

PR 06-OCT-1999; 99US-0157865P . 

PR 07-OCT-1999; 99US-0158O29P . 

PR 08-OCT-1999; 99US-0158232P . 



PR 


12- 


-OCT- 


1999 


PR 


13- 


-OCT- 


1999 


PR 


13- 


-OCT- 


1999 


PR 


13- 


-OCT- 


1999 


PR 


14- 


-OCT- 


1999 


PR 


14- 


-OCT- 


1999 


PR 


14- 


-OCT- 


1999 


PR 


14- 


-OCT- 


1999 


PR 


14 


-OCT- 


1999 


PR 


18 


-OCT- 


1999 


PR 


21 


-OCT- 


1999 


PR 


21 


-OCT- 


1999 


PR 


21 


-OCT- 


1999 


PR 


21 


-OCT- 


1999 


PR 


21 


-OCT- 


1999 


PR 


21 


-OCT- 


1999 


PR 


22 


-OCT- 


1999 


PR 


22 


-OCT- 


1999 


PR 


22 


-OCT- 


1999 


PR 


25 


-OCT- 


1999 


PR 


25 


-OCT- 


1999 


PR 


25 


-OCT- 


1999 


PR 


26 


-OCT- 


1999 


PR 


26 


-OCT- 


■1999 


PR 


26 


-OCT- 


-1999 


PR 


28 


-OCT- 


-1999 


PR 


28 


-OCT- 


-1999 


PR 


28 


-OCT- 


-1999 


PR 


29 


-OCT- 


-1999 



99US-0158369P. 
99US-0159293P. 
99US-0159294P. 
99US-0159295P. 
99US-0159329P. 
99US-0159330P. 
99US-0159331P. 
99US-0159637P. 
99US-0159638P. 
99US-0159584P. 
99US-0160741P. 
99US-0160767P. 
99US-0160768P. 
99US-0160770P. 
99US-0160814P. 
99US-0160815P. 
99US-0160980P. 
99US-0160981P. 
99US-0160989P. 
99US-0161404P. 
99US-0161405P. 
99US-0161406P. 
99US-0161359P. 
99US-0161360P. 
99US-0161361P. 
99US-0161920P. 
99US-0161992P. 
99US-0161993P. 
99US-0162142P. 



Query Match 20.8%; Score 730.5; DB 3; Length 648; 

Best Local Similarity 30.7%; Pred. No. 2.2e-65; 

Matches 211; Conservative 117; Mismatches 269; Indels 91; Gaps 19; 



Qy 


9 


RGLPKGATPQDTSGLQDRLFSSES — DNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFE 

: M 1 : 1 1 1 : 1 : 1 : 1 1 1 1 : : : 1 : 1 : 1 
QGLPDMSDTQSKSVLAFPTITSQPGLQMSMY PITLKFEEWYKVKI E 


66 


Db 


19 


65 


Qy 


67 


QLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGK 

|:| || : : :: I 1: M::| II 1: :M : II 1 
QTSQCMGSWKSKE KTILNGITGMVCPGEFLAMLGPSGSGKTTLLSALGGR — LSK 


126 


Db 


66 


118 


Qy 


127 


IKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDK 

||:: | | | | I : I : I 1 : 1 1 : 1 1 1 1 1 1 1 1 : 1 1 1 : : : : : : 
TFSGKVMYNGQPFSGCIKRR-TGFVAQDDVLYPHLTVWETLFFTALLRLPSSLTRDEKAE 


186 


Db 


119 


177 


Qy 


187 


RVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSF 

| : | | | | | | : | : : : I 1 1 : II 1 1 : : 1 1 1 1 1 : : 1 II : 1 : 1 II 1 1 1 1 1 II 
HVDRVIAELGLNRCTNSMIGGPLFRGISGGEKKRVSIGQEMLINPSLLLLDEPTSGLDST 


246 


Db 


178 


237 


Qy 


247 


T AHN LVKT L S RLAKGNRLVL I S LHQ P R S D I FRL FD LVL LMT S GT P I YL GAAQHMVQ Y FT A 
| | | : I I : 1 1 1 1 1 1 : : : 1 1 1 1 1 : : 1 1 1 : 1 : : 1 : 1 1 1 1 M 1 : 1 1 : : 
TAHRIVTTIKRIASGGRTVVTTIHQPSSRIYHMFDKVVLLSEGSPIYYGAASSAVEYFSS 


306 


Db 


238 


297 


Qy 


307 


IGYPCPRYSNP7\DFYVDLTS IDRRSREQELATREKAQSLAALFLEKVRDLDDFLW 

: | : 1 II 1 : 1 1 : : : 1 1 1 1 : : : 1 : : : : 
LGFSTSLTWPADLLLDLANGIPPDTQKETSEQEQKTVK — ETLVSAYEKNI 


361 


Db 


298 


347 



Qy 362 KAETKDLDEDTCVESS VTPLDTNCLPSPTKMPGAVQQFTTLIRRQI-SNDFRDLPT 416 

I I I : I I I II | | | | : : | : I 

Db 348 — STK-LKAELCNAESHSYEYTKAAAKNLKSEQWCTTWWYQFTVLLQRGVRERRFESFNK 404 

Qy 417 LLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERA 47 6 

II : : | | : | : : I I I I I I : : : I : 

Db 4 05 LRIF QVISVAFLGGLLWWH-TPKSHIQDRTALLFFFSVFWGFYPLYNAVFTFPQEKR 460 

Qy 477 MLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVW 536 

II I I : I I I I : : I : I I I : I I : hi I : I hi 

Db 4 61 MLIKERSSGMYRLSSYFMARNVGDLPLELALPTAFVFIIY¥MGGLKPDPTTFILSLLVVL 52 0 

Qy 537 LVVFCCRIMALAAAALLPTFHMASFFSNAIjYNSFYIAGGFMINLSSLWTVP AWISKV 593 

I : : I I I I I I : : : I : I I I : : : I I : : 

Db 521 YSVLVAQGLGLAFGALLMNIKQATTLASVTTLVFLIAGGYYVQ QIPPFIVWLKYL 575 

Qy 594 SFLRWCFEGLMKIQFSRRTY KMPLGNLTIAVSGDKILSAMEL 635 

|: :|:: |: II:: I I I II I I : .: I 

Db 576 SYSYYCYKLLLGIQYTDDDYYECSKGVWCRVGDFPAIKSMGLNNLWI DVFVMGVML 631 

Qy 636 DSYPLYAIYLIVIGLSGGFMVLYYVSLR 663 

III : I I : I I I 

Db 632 VGYRLMA YMALHRVKLR 64 8 



RESULT 7 
AAG18080 

ID AAG18080 standard; protein; 625 AA. 
XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
KW 
XX 
OS 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 
PR 



AAG18080; 

17-OCT-2000 (first entry) 

Arabidopsis thaliana protein fragment SEQ ID NO: 19345. 

Protein identification; signal transduction pathway; metabolic pathway; 
hybridisation assay; genetic mapping; gene expression control; promoter; 
termination sequence. 

Arabidopsis thaliana. 

EP1033405-A2. 

06-SEP-2000. 

25-FEB-2000; 2000EP-00301439 . 



25-FEB-1999 

05- MAR-1999 
09-MAR-1999 
23-MAR-1999 
25-MAR-1999 
29-MAR-1999 
01-APR-1999 

06- APR-1999 
08-APR-1999 



99US-0121825P. 
99US-0123180P. 
99US-0123548P. 
99US-0125788P. 
99US-0126264P. 
99US-0126785P. 
99US-0127462P. 
99US-0128234P. 
99US-0128714P. 



PR 16-APR-1999; 99US-0129845P . 

PR 19-APR-1999; 99US-0130077P . 

PR 21-APR-1999; 99US-0130449P . 

PR 23-APR-1999; 99US-0130510P . 

PR 23-APR-1999; 99US-0130891P . 

PR 28-APR-1999; 99US-0131449P . 

PR 30-APR-1999; 99US-0132048P . 

PR 30-APR-1999; 99US-0132407P . 

PR 04-MAY-1999; 99US-0132484P . 

PR 05-MAY-1999; 99US-0132485P . 

PR 06-MAY-1999; 99US-0132486P . 

PR 06-MAY-1999; 99US-01324 87P . 

PR 07-MAY-1999; 99US-0132863P . 

PR ll-MAY-1999; 99US-0134256P . 

PR 14-MAY-1999; 99US-0134218P . 

PR 14-MAY-1999; 99US-0134219P . 

PR 14-MAY-1999; 99US-0134221P . 

PR 14-MAY-1999; 99US-0134370P . 

PR 18-MAY-1999; 99US-0134768P . 

PR 19-MAY-1999; 99US-0134941P . 

PR 20-MAY-1999; 99US-0135124P . 

PR 21-MAY-1999; 99US-0135353P . 

PR 24-MAY-1999; 99US-0135629P . 

PR 25-MAY-1999; 99US-0136021P . 

PR 27-MAY-1999; 99US-0136392P . 

PR 28-MAY-1999; 99US-0136782P . 

PR 01-JUN-1999; 99US-0137222P . 

PR 03-JUN-1999; 99US-0137528P . 

PR 04-JUN-1999; 99US-0137502P . 

PR 07-JUN-1999; 99US-0137724P . 

PR 08-JUN-1999; 99US-0138094P . 

PR 10-JUN-1999; 99US-0138540P . 

PR 10-JUN-1999; 99US-0138847P . 

PR 14-JUN-1999; 99US-0139119P . 

PR 16-JUN-1999; 99US-0139452P . 

PR 16-JUN-1999; 99US-0139453P . 

PR 17-JUN-1999; 99US-0139492P . 

PR 18-JUN-1999; 99US-0139454P . 

PR 18-JUN-1999; 99US-0139455P . 

PR 18-JUN-1999; 99US-0139456P . 

PR 18-JUN-1999; 99US-0139457P . 

PR 18-JUN-1999; 99US-0139458P . 

PR 18-JUN-1999; 99US-0139459P . 

PR 18-JUN-1999; 99US-0139460P . 

PR 18-JUN-1999; 99US-0139461P. 

PR 18-JUN-1999; 99US-0139462P. 

PR 18-JUN-1999; 99US-01394 63P . 

PR 18-JUN-1999; 99US-0139750P . 

PR 18-JUN-1999; 99US-0139763P . 

PR 21-JUN-1999; 99US-0139817P. 

PR 22-JUN-1999; 99US-0139899P. 

PR 23-JUN-1999; 99US-0140353P . 

PR 23-JUN-1999; 99US-0140354P . 

PR 24-JUN-1999; 99US-0140695P . 

PR 28-JUN-1999; 99US-0140823P . 

PR 29-JUN-1999; 99US-0140991P . 

PR 30-JUN-1999; 99US-0141287P . 



PR 01-JUL-1999; 99US-0141842P . 

PR 01-JUL-1999; 99US-0142154P . 

PR 02-JUL-1999; 99US-014.2055P - 

PR 06-JUL-1999; 99US-0142390P . 

PR 08-JUL-1999; 99US-0142803P . 

PR 09-JUL-1999; 99US-0142920P . 

PR 12-JUL-1999; 99US-0142 977P . 

PR 13-JUL-1999; 99US-014 3542P . 

PR 14-JUL-1999; 99US-0143624P . 

PR 15-JUL-1999; 99US-0144005P . 

PR 16-JUL-1999; 99US-0144085P . 

PR 16-JUL-1999; 99US-0144086P . 

PR 19-JUL-1999; 99US-0144325P . 

PR 19-JUL-1999; 99US-0144331P . 

PR 19-JUL-1999; 99US-0144332P ■ 

PR 19-JUL-1999; 99US-0144333P . 

PR 19-JUL-1999; 99US-0144334P . 

PR 19-JUL-1999; 99US-0144335P . 

PR 20-JUL-1999; 99US-0144352P . 

PR 20-JUL-1999; 99US-0144632P . 

PR 20-JUL-1999; 99US-014 4884P . 

PR 21-JUL-1999; 99US-0144814P . 

PR 21-JUL-1999; 99US-0145086P . 

PR 21-JUL-1999; 99US-0145088P . 

PR 22-JUL-1999; 99US-0145085P . 

PR 22-JUL-1999; 99US-0145087P . 

PR 22-JUL-1999; 99US-0145089P . 

PR 22-JUL-1999; 99US-0145192P . 

PR ' 23-JUL-1999; 99US-0145145P . 

PR 23-JUL-1999; 99US-0145218P . 

PR 23-JUL-1999; 99US-0145224P . 

PR 26-JUL-1999; 99US-0145276P . 

PR 27-JUL-1999; 99US-0145913P . 

PR 27-JUL-1999; 99US-0145918P . 

PR 27-JUL-1999; 99US-0145919P . 

PR 28-JUL-1999; 99US-0145951P . 

PR 02-AUG-1999; 99US-0146386P . 

PR 02-AUG-1999; 99US-0146388P . 

PR 02-AUG-1999; 99US-014 6389P . 

PR 03-AUG-1999; 99US-0147038P . 

PR 04-AUG-1999; 99US-0147204P . 

PR 04-AUG-1999; 99US-0147302P . 

PR 05-AUG-1999; 99US-0147192P . 

PR 05-AUG-1999; 99US-0147260P . 

PR 06-AUG-1999; 99US-0147303P . 

PR 06-AUG-1999; 99US-0147416P . 

PR 09-AUG-1999; 99US-0147493P . 

PR 09-AUG-1999; 99US-0147935P . 

PR 10-AUG-1999; 99US-0148171P 

PR ll-AUG-1999; 99US-014 8319P 

PR 12-AUG-1999; 99US-0148341P 

PR 13-AUG-1999; 99US-014 8565P 

PR 13-AUG-1999; 99US-0148684P 

PR 16-AUG-1999; 99US-0149368P 

PR 17-AUG-1999; 99US-0149175P 

PR 18-AUG-1999; 99US-0149426P 

PR 20-AUG-1999; 99US-0149722P 



PR 20-AUG-1999; 99US-0149723P* 

PR 20-AUG-1999; 99US-0149929P . 

PR 23-AUG-1999; 99US-014 9902P . 

PR 23-AUG-1999; 99US-0149930P . 

PR 25-AUG-1999; 99US-0150566P . 

PR 26-AUG-1999; 99US-0150884P . 

PR 27-AUG-1999; 99US-0151065P . 

PR 27-AUG-1999; 99US-0151066P . 

PR 27-AUG-1999; 99US-0151080P . 

PR 30-AUG-1999; 99US-0151303P ■ 

PR 31-AUG-1999; 99US-0151438P . 

PR 01-SEP-1999; 99US-0151930P . 

PR 07-SEP-1999; 99US-0152363P . 

PR 10-SEP-1999; 99US-0153070P . 

PR 13-SEP-1999; 99US-0153758P . 

PR 15-SEP-1999; 99US-0154018P . 

PR 16-SEP-1999; 99US-0154 039P . 

PR 20-SEP-1999; 99US-0154779P . 

PR 22-SEP-1999; 99US-0155139P . 

PR 23-SEP-1999; 99US-0155486P. 

PR 24-SEP-1999; 99US-0155659P . 

PR 28-SEP-1999; 99US-0156458P . 

PR 29-SEP-1999; 99US-0156596P . 

PR 04-OCT-1999; 99US-0157117P . 

PR 05-OCT-1999; 99US-0157753P . 

PR 06-OCT-1999; 99US-0157865P . 

PR 07-OCT-1999; 99US-0158029P . 

PR 08-OCT-1999; 99US-0158232P . 

PR 12-OCT-1999; 99US-0158369P . 

PR 13-OCT-1999; 99US-0159293P . 

PR 13-OCT-1999; 99US-01592 94P . 

PR 13-OCT-1999; 99US-0159295P . 

PR 14-OCT-1999; 99US-0159329P . 

PR 14-OCT-1999; 99US-0159330P . 

PR 14-OCT-1999; 99US-0159331P . 

PR 14-OCT-1999; 99US-0159637P . 

PR 14-OCT-1999; 99US-0159638P . 

PR 18-OCT-1999; 99US-0159584P . 

PR 21-OCT-1999; 99US-0160741P . 

PR 21-OCT-1999; 99US-0160767P . 

PR 21-OCT-1999; 99US-0160768P . 

PR 21-OCT-1999; 99US-0160770P. 

PR 21-OCT-1999; 99US-0160814P . 

PR 21-OCT-1999; 99US-0160815P . 

PR 22-OCT-1999; 99US-0160980P . 

PR 22-OCT-1999; 99US-0160981P . 

PR 22-OCT-1999; 99US-0160989P-. 

PR 25-OCT-1999; 99US-0161404P . 

PR 25-OCT-1999; 99US-0161405P . 

PR 25-OCT-1999; 99US-0161406P . 

PR 26-OCT-1999; 99US-0161359P . 

PR 26-OCT-1999; 99US-0161360P . 

PR 26-OCT-1999; 99US-0161361P . 

PR 28-OCT-1999; 99US-0161920P . 

PR 28-OCT-1999; 99US-0161992P . 

PR 28-OCT-1999; 99US-0161993P . 

PR 29-OCT-1999; 99US-0162142P . 



Query Match 20.7%; Score 724; DB 3; Length 625; 

Best Local Similarity 30.9%; Pred. No. 9.8e-65; 

Matches 211; Conservative 113; Mismatches 258; Indels 100; Gaps 20; 

Qy 15 ATPQDTS— GLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFK 72 

II II III I : I III: : : I : I : I I = I 

Db 11 AFPTITSQPGLQ MSMY PITLKFEEWYKVKI EQTSQCM 48 

Qy 73 MPWT S P S CQN S CELGI QNL S FKVRS GQMLAI I GS S GCGRAS LLDVI TGRGHGGKI KS GQI 132 

II : : : : I I : I I : : I I I I : : I I : I I I II:: 

Db 49 GSWKSKE KTILNGITGMVCPGEFLAMLGPSGSGKTTLLSALGGR — LSKTFSGKV 101 

Qy 133 WINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVI 192 

I I I I I : I : I I : I I : M I I I I I I : I M :::::: I : I I 

Db 102 MYNGQPFSGCIKRR-TGFVAQDDVLYPHLTVWETLFFTALLRLPSSLTRDEKAEHVDRVI 160 

Qy 193 AELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLV 252 

I I I | : | : : : I I I : I I I I : : I I I I I : : I II : I : I I I I I I I I M III : I 

Db 161 AELGLNRCTNSMIGGPLFRGISGGEKKRVSIGQEMLINPSLLLLDEPTSGLDSTTAHRIV 220 

Qy 253 KTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCP 312 

I : Ml I I I : : : I I I I I : : I I I : I : : hill III |: I h : : h 
Db 221 TTIKRIASGGRTVWTIHQPSSRIYHMFDKVVLLSEGSPIYYGAASSAVEYFSSLGFSTS 280 

Qy 313 RYSNPADFYVDLTS IDRRSREQELATREKAQSLAALFLEKVRDLDDFLWKAETKD 367 

I I I I : I I : : : Ml | : ::| : : : : II 
Db 281 LTVNPADLLLDLANGI PPDTQKETSEQEQKTVK — ETLVSAYEKNI STK- 327 

Qy 368 LDEDTCVESS VTPLDTNCLPSPTKMPGAVQQFTTLIRRQI-SNDFRDLPTLLIHGA 422 

I : I I I II I M I : : I : I II 

Db 328 LKAELCNAESHSYEYTKAAAKNLKSEQWCTTWWYQFTVLLQRGVRERRFESFNKLRIF — 385 

Qy 423 EACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIG7VLIPFNVILDVISKCYSERAMLYYEL 482 

: : | | : | : : I II I I I : : : hill 

Db 386 -QVISVAFLGGLLWWH-TPKSHIQDRTALLFFFSVFWGFYPLYNAVFTFPQEKRMLIKER 443 

Qy 483 EDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWFCC 542 

|:| | | |: : | : | | I : I I : hi hi I : I I 

Db 444 S SGMYRLS S YFMARNVGDLPLELALPTAFVFI I YWMGGLKPDPTTFI LSLLWLYSVLVA 503 

Qy 543 RIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVP AWISKVSFLRWC 599 

: : || Ml h :: I =111= : :| |: :|: :| 

Db 504 QGLGLAFGALLMNIKQATTLASVTTLVFLIAGGYYVQ QIPPFIVWLKYLSYSYYC 558 

Qy 600 FEGLMKIQFSRRTY KMPLGNLTIAVSGDKILSAMELDSYPLY 641 

: : I : I |: : I I I I I I I : : I I I 

Db 559 YKLLLGIQYTDDDYYECSKGVWCRVGDFPAIKSMGLNNLWI DVFVMGVMLVGYRLM 614 

Qy 642 AIYLIVIGLSGGFMVLYYVSLR 663 

I : | | : | | | 

Db 615 A YMALHRVKLR 625 



RESULT 8 
AAU96986 

ID AAU96986 standard; protein; 652 AA. 



XX 

AC AAU96986; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE Rat ABCG5 protein. 
XX 

KW Rat; ABCG5 ; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease. 
XX 

OS Rattus sp. 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-02352 68P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 
PA . (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR N-PSDB; ABK51686. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer f s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the rat ABCG5 protein of the invention. (Updated 

CC on 07-AUG-2003 to correct OS field. ) 



XX 

SQ Sequence 652 AA; 



Query Match 20.3%; Score 713; DB 5; Length 652; 

Best Local Similarity 30.0%; Pred. No. 1.4e-63; 

Matches 190; Conservative 115; Mismatches 232; Indels 96; Gaps 15; 

Qy 12 PKGAT-PQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQV- PWFEQLA 69 

|:|| | : || II : I: II ::| I II II 

Db 9 PEGARGPHNNRGSQ SSLEEGSV — TGSEARHSLGV — LNVSFSVSNRVGPW 55 

Qy 70 QFKMPWTSPSCQNSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIK 128 

I Ml : I : : : I : I I I : I : I I I I I : : I I I I : I I 
Db 56 WNIKSCQQKWDRKILKDVSLYIESGQTMCILGSSGSGKTTLLDAISGRLRRTGTL 110 

Qy 129 SGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRV 188 

| : : : : | | : I I : : : I : I : I I I I II I : I : I I : I I I : I 

Db 111 EGEVFVNGCELRRDQFQDCVSYLLQSDVFLSSLTVRETLRYTAMLAL-RSSSADFYDKKV 169 

Qy 189 EDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTA 248 

| | : | | | || : I I 1:111111111 I I I : I : : : I I I I I : I I I II 

Db 170 EAVTjTELSLSHVADQMIGNYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTA 229 

Qy 249 HNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLG7UVQHMVQYFTAIG 308 

: : : I I I I : I I : I : : : : I I I I I : : I | | : : : I I : : I : I : : I I 
Db 230 NHIVLLLVELARRNRIVIVTIHQPRSELFHHFDKIAILTYGELVFCGTPEEMLGFFNNCG 289 

Qy 309 YPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDL 356 

I I I I : I I I I I I : I I I I : I : I I I : I : I : : I I : I : I : I I 

Db 290 YPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLESAFRQSDICHKILENIERTRHL 349 

Qy 357 DDFLWK7VETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLP 415 

I : : I II II : I : I I I I : 
Db 350 KTLPM VPFKTKNPPGMFCKLGVLLRRVTRNLMRNKQ 385 

Qy 416 TLLIHGAEACLMSMTTGF — LYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYS 473 

: : : : : | : : I I : : : : I I I : : I : : I : : : 

Db 386 WIMRLVQNLIMGLFLIFYLLRVQNNMLKGAVQDRVGLLYQLVGATPYTGMLNAVNLFPM 445 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYI I IYGMPTYWLANLRPGLQPFLLHFL 533 

II: I : II I I I : I I I : I : I I I I : I 
Db 446 LRAVSDQESQDGLYQKWQMLLAYVLHALPFSIVATVIFSSVCYWTLGLYPEVARF 500 

Qy 534 LVWLWFCCRIMALAAAALLPTFHMASFFSNAL YNSFYLAGG 575 

: I I M : I : I : : I 

Db 501 GYFSAALLAPHLIGEFLTLVLLGMVQNPNIVNSIVALLSISGLLIGSG 548 

Qy 576 FMINLSSLWTVPAWISKVSFLRWCFEGLMKIQF 608 

I: I: : : :| ::| I h :l 

Db 549 FIRNIEEMPIPLKILGYFTFQKYCCEILWNEF 581 



RESULT 9 
AAU96990 

ID AAU96990 standard; protein; 651 AA. 
XX 

AC AAU96990; 



XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R389H protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia ; Alzheimer's disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 389 

FT /note= "Wild-type Arg substituted by His" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/ ) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 7; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 



CC acid sequence represents the human ABCG5 mutant R389H protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 
XX 

SQ Sequence 651 AA; 

Query Match 20.1%; Score 705; DB 5; Length 651; 

Best Local Similarity 29.0%; Pred. No. 9.5e-63; 

Matches 188; Conservative 124; Mismatches 240; Indels 96; Gaps 16; 

Qy 16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

11:111 II | : :|::| : :| | : ||:: : : I 

Db 8 TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR-PWWD-ITSCRQQW 64 

Qy 76 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

I : : : : I I I I I : : I : I I I I I : : I I I : : I I I I I : : : : 

Db 65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

Qy 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVI7VE 194 

||: : : I : : I I : I I : I I I I I I I : I : : I : I : I I I : I I 

Db 116 NGRALRREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI -RRGNPGS FQKKVEAVMAE 174 

Qy 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 

|| || : I I : I : I I I I I I I I I I || : I : : : I I I M II I I I : : I 

Db 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVL 234 

Qy 255 LSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRY 314 

I | | : | I : I : : : : I I I I I : : I : I I I : : : : I I : I I : : I Mill: 

Db 235 LVELARRNRIVVLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

Qy 315 S N PAD F YVD LT S I D RRS REQ E LAT RE KAQ S LAAL F LEKVRDLDDFLWK 362 

II I I I I : I I I I : I : I : I : I : I : : I : : : : I : : : I 

Db 295 SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL 34 8 

Qy 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

I : : I II II : MM I M : : I 
Db 349 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITHL 390 

Qy 422 AEAC LMSMT I G FL Y FG HGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSER 475 

: : I : : I MM I I M | : : I : : : I 

Db 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ D RVGL L YQ FVGAT P YT GMLNAVN L F P VLR 44 6 

Qy 476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

I: 1:1111 I I II : I : II I I : I 
Db 447 AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF 499 

Qy 536 WLWFCCRIMALAAAALLPTFHMASFFS NALYNSFYLAG GFM 577 

Mill : I : M: Ml IM 

Db 500 GYFSAALLAPHLI GEFLTLVLLGI VQNPNI WSVVALLS I AGVLVGSGFL 549 

Qy 578 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 

I : : | | : | : : | | | : : | : | : : : : 

Db 550 RNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTN 597 



RESULT 10 



AAU96993 

ID AAU96993 standard; protein; 651 AA. 
XX 

AC AAU96993; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R419P protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens . 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 419 

FT /note= "Wild-type Arg substituted by Pro" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PAT EL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 10; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 



CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R419P protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 

Query Match 19.9%; Score 697; DB 5; Length 651; 

Best Local Similarity 28.9%; Pred. No. 6.3e-62; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16; 

Qy 16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

II : I I I I I | : : | : : | : : I I : I I : : : : I 

Db 8 TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR-PWWD-ITSCRQQW 64 

Qy 76 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

I : : : : I I I I I : : I : I I I I I : : I I I : : I I I I I : : : : 

Db 65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

Qy 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

II: : : I : : I I : I I : I I I I I I I : I : : I : I : I I I : I I 

Db 116 NGRALRREQFQDCFSWLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKKVEAVMAE 174 

Qy 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 

II II : I I : I : I I I I I I I I I I I I : I : : : I I I I : I I I I I : : I 

Db 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVL 234 

Qy 255 LSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYP C PRY 314 

I I I : I I : I : : : : I I I I I : : I : I I I : : : : I I : I I : : I Mill: 

Db 235 LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

Qy 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 

I I I I I I : I I I I : I : I : I : I : I : : I : : : : I : : : I 
Db 295 SNPFDFYMDLTSVDTQS KEREI ETSKRVQMI ESAYKKSAI CHKTLKNIERMKHL 34 8 

Qy 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

I : : I I I I I : I : I I I I : : : 

Db 349 KTLPM VP FKTKDS PGVFS KLGVLLRRVTRNLVRNKLAVI TRL 390 

Qy 422 AEACLMSMTIGFLYFG HGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSER 475 

: : I : : I I : I I I I I : | : : | : : : I 

Db 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ D P VGL L YQ FVGAT P YT GMLNAVN L F P VL R 446 

Qy 47 6 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

I : I : I I I I I I I I : I : II I I : I 
Db 447 AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEV7VRF 499 

Qy 536 WLWFCCRIMALAAAALLPTFHMASFFS NALYNS FYLAG GFM 577 

: I I I I : I I : : : I I I I : 

Db 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 549 

Qy 578 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 

I : : I I : I : : I I I : : I : | : : : : 



Db 550 RNIQEMPI PFKI I S YFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTN 597 



RESULT 11 
AAU96984 

ID AAU96984 standard; protein; 651 AA. 
XX 

AC AAU96984; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 protein. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hyper sterolemia; Alzheimer's disease; 

KW chromosome 2p21. 
XX 

OS Homo sapiens. 
XX 

FH Key Location/Qualif iers 

FT Misc-dif ference 2. .15 

FT /note= "Encoded by GGTCTC" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PAT EL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR N-PSDB; ABK51681. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 52; Page 35-36; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 



CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 protein of the invention. This 

CC sequence is encoded by the human ABCG5 gene located on chromosome 2p21 

XX 

SQ Sequence 651 AA; 

Query Match 19.9%; Score 697; DB 5; Length 651; 

Best Local Similarity 28.9%; Pred. No. 6.3e-62; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16; 

Qy 16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

II : III II | : :|::| : :| I : ||:: : : I 

Db 8 TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR- PWWD-ITSCRQQW 64 

Qy 76 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

I : : : : I I I I I : : I : I I I I I : : I I I : : I I I I I : : : : 

Db 65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

Qy 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

II: : : I : : I I : I I : I I I I I I I : I : : I : I : I I I : I I 

Db 116 NGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKKVEAVMAE 174 

Qy 195 LRLRQCADTRVGNMYVRGLS GGERRRVS I GVQLLWN PGI LI LDEPT SGLDSFTAHNLVKT 254 

II II : I I : I : I I II I I I I I I I I : I : : : I I I I : I I I I I : : I 

Db 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVL 234 

Qy 255 LSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRY 314 

I I I : I I : I : : : : I I I I I : : I : I I I : : : : I I : I I : : I Mill: 

Db 235 LVELARRNRIWXTIHQPRSELFQLFDKIAILSFGELIFCGTP7VEMLDFFNDCGYPCPEH 2 94 

Qy 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 

I I I I I I : I I I I : I : I : I : I : I : : I : : : : I : : : I 
Db 295 SNP FDFYMDLT S VDTQS KEREI ET S KRVQMI ES AYKKSAI CHKTLKNI ERMKHL 348 

Qy 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

I : : I II II : MM I I : : : 

Db 349 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 

Qy 422 AEACLMSMTIGFLYFG HGS IQLS FMDTAALLFMI GALI PFNVI LDVI SKCYSER 475 

: : I : : I MM I I I : | : : I : : : I 

Db 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ D RVGLL YQ FVGAT P YTGMLNAVN L FP VL R 44 6 

Qy 476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

I : IMIII I I I I : I : II I I : I 
Db 447 AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF 499 

Qy 536 WLWFCCRIMALAAAALLPTFHMASFFS NALYNSFYLAG GFM 577 

Mill : I : I:: Ml IM 

Db 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 549 



Qy 578 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 

I : : II : I : : I I I : : I : I : : : : 

Db 550 RNIQEMPI P FKI I S YFTFQKYCSEI LWNEFYGLNFTCGS SNVSVTTN 597 



RESULT 12 
AAE13290 

ID AAE13290 standard; protein; 651 AA. 
XX 

AC AAE13290; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitosterolaemia susceptibility gene (SSG) protein. 
XX 

KW Human; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 2p21. 

XX 

OS Homo sapiens. 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2000US-02 04234 P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR N-PSDB; AAD22009. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 19; Fig 8; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 



CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is human SSG protein. Human SSG is located on chromosome 

CC 2p21 

XX 

SQ Sequence 651 AA; 

Query Match 19.9%; Score 697; DB 5; Length 651; 

Best Local Similarity 28.9%; Pred. No. 6.3e-62; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16; 

Qy 16 T PQDT S GLQDRLFS S ES DN S LYFT YS GQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

||:||| II | : :|::| : : I I : II:: : : I 

Db 8 TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR-PWWD-ITSCRQQW 64 

Qy 7 6 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

| : : : : i I I I I : : I : I I I I I : : I I I : : I I I I I : : : : 

Db 65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

Qy 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

||: : : I :: I I : I I : I I I I I I I : I : : I : I : II I : I I 

Db 116 NGRALRREQFQDCFSYVXQSDTLLSSLTVl^ETLHYTALLAI-RRGNPGSFQKKVEAvMAE 174 

Qy 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFT.AHNLVKT 254 

|| || : || : I : I I I I I I I I I I I I : I : : : I I I I : I I I I I : : I 
Db 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWL 234 

Qy 255 L S RLAKGNRLVL I S LHQ P RS DI FRL FDLVLLMT S GT P I YLGAAQHMVQ YFT AI GY PC P RY 314 

I | | : | | : I : : : : I I I I I : : i : I I I : : : : I I : I I : = I 11111 = 

Db 235 LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

Qy 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 

I I I I I I : I I I I : I : I : I : I : I : : I : : : : \ : : : I 
Db 295 SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL 348 

Qy 363 AETKDLDEDTCVESS VTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

I : : I I I I I : I : I I I I : : : 
Db 34 9 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 

Qy 422 AEACLMSMTIGFLYFG HGSIQLSFMDTAALLFMIGALIPFNVTLDVTSKCYSER 475 

: : | : : I I : I I I I I : | : : | : : : I 

Db 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ DRVGL L YQ FVGAT P YT GMLNAVN L F P VLR 44 6 

Qy 47 6 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

I: I : I I I I I I II :!: II I I : I 
Db 447 AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF 499 

Qy 536 WLWFCCRIMALAAAALLPTFHMASFFS NAL YN S FY LAG GFM 577 

: I I I I : I : I : : : I I I I : 

Db 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 549 

Qy 57 8 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 

I: : II :l ::| I I: :l : |::: : 

Db 550 RNIQEMPIPFKIISYFTFQKYCSEILVWEFYGLNFTCGSSNVSVTTN 597 



RESULT 13 



AAE31704 

ID AAE31704 standard; protein; 651 AA. 
XX 

AC AAE31704; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Human ABCG5 protein. 
XX 

KW ABC family cholesterol transporter; ABCG8; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5. 
XX 

OS Homo sapiens. 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P. 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR N-PSDB; AAD48882. 
XX 

PT New ABCG8 polypeptides and nucleic acids , useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 28; Page 78-79; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolaemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG5 protein 
XX 

SQ Sequence 651 AA; 

Query Match 19.9%; Score 697; DB 6; Length 651; 
Best Local Similarity 28.9%; Pred. No. 6.3e-62; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16 



16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 



II: | : :|::| : :| | : I |:: : : I 

Db 8 T PGGSMGLQVNRGSQS SLEGAPAT-APEPHSLGILHAS YSVSHRVR- PWWD- ITSCRQQW 64 

Qy 76 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

| : : : : I I II I : : I : I M I I : : I II : : I I I I I : : : : 

Db 65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

Qy 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

II: : : I : : I I : I I : I I I I I M : I : : I : I : M I : I I 

Db 116 NGRALRREQFQDCFSYVXQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKKVEAVMAE 174 

Qy 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 

II II : I I : I : I I I I I I I I I I I I : I : : : I II I : I I I II : : I 
Db 175 LSLSHVADRLI GNYSLGGI STGERRRVS IAAQLLQDPKVMLFDEPTTGLDCMTANQI WL 234 

Qy 255 L S RLAKGNRLVLI S LHQ P RS DI FRL FDLVLLMT S GT P I YLGAAQHMVQ YFTAI GYP C P RY 314 

| ||: ||:|::::|||||::|:|ll : ::: I |: I I: :l Mill : 

Db 235 LVELARRNRI WLT IHQPRS ELFQLFDKIAI LS FGELI FCGTPAEMLDFFNDCGYPCPEH 294 

Qy 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF — LEKVRDLDDFLWK 362 

I I I I I I : I I I I : I : I : I : I : I : : I : : : : I : : : I 
Db 295 SNP FDFYMDLT S VDTQS KEREI ET S KRVQMI ESAYKKSAI CHKTLKNI ERMKHL 348 

Qy 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

I : : I II II : MM I M : : 

Db 349 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 

Qy 422 AEACLMSMT I GFL YFG HGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSER 475 

: : I : : I MM I I M | : : I : : : I 

Db 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ D RVG L L YQ FVGAT P YT GMLNAW L F P VL R 446 

Qy 47 6 AMLYYELEDGLYTTGPYFF7VKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

M I : II II I I II : I : I I I I : I 

Db 447 AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF 499 

Qy 536 WLWFCCRIMALAAAALLPTFHMASFFS NALYNSFYLAG GFM 577 

: | | || : I : M : Ml I I : 

Db 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 54 9 

Qy 578 INLS SLWTVPAWI S KVS FLRWCFEGLMKI QFS RRT YKMPLGNLT I AVS 625 

|: : II M :M I M M : |::: : 

Db 550 RNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTN 597 



RESULT 14 
AAU96989 

ID AAU96989 standard; protein; 651 AA. 
XX 

AC AAU96989; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R419H protein sequence. 
XX 

KW Human; ABCG5 ; ATP-binding cassette gene 5; sitosterolemia ; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW mutant; mutein. 



XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 419 

FT /note= "Wild-type Arg substituted by His" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 9; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer ! s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R419H protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 



Query Match 



19.9%; Score 696; DB 5; Length 651; 



Best Local Similarity 28.9%; Pred. No. 8e-62; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16; 



Qy 16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

M : I I I I I | : : | : : I : : I I : I I : : : : I 

Db 8 TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR-PWWD-ITSCRQQW 64 

Qy 7 6 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

I : : :: I I I I I : : I : I I I I I : : I I I : : I I I I M : : : 

Db 65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

Qy 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

||: : : I :: I I : I I : I I I I I I I : I : : I : I : II I : I I 

Db 116 NGRALRREQFQDC FS YVLQ S DT LLS S LT VRET LH YTALLAI - RRGN P GS FQKKVEAVMAE 174 

Qy 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 

|| || : I I : I : I I II I I I I I I | I : I : : : I I I I : I I I I I : : I 
Db 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWL 234 

Qy 255 L S RLAKGN RLVL I S LHQ P RS DI FRL FDLVLLMT S GT P I YLGAAQHMVQ Y FTAI GY PC PRY 314 

I I I : I I : I : : : : I I I I I : : I : I I I : : : : I I : I I • ' I Mill: 

Db 235 LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

Qy 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 

I I I I I I : II I I : I : I : I : I : I : : I : : = : I : : : I 
Db 295 SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL 348 

Qy 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

I: : I II II : MM I M :: 

Db 34 9 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 

Qy 422 AEACLMSMTIGFLYFG HGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSER 475 

: : | : : I MM I I M | : : I : : : I 

Db 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ DHVGLL YQ FVGAT P YT GMLN AVNL F PVL R 446 

Qy 476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

I : IMIII I I I I : I : M 11=1 

Db 447 AVSDQESQDGLYQKWQMMLAYALHVLPFSVVATMIFSSVCYWTLGLHPEVARF 499 

Qy 536 WLWFCCRIMALAAAALLPT FHMAS FFS NALYNS FYLAG GFM 577 

: | | | I : I : I :: Ml I M 

Db 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 54 9 

Qy 57 8 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 

I : : II : | : : | | I : : I : M : : : 

Db 550 RNIQEMPI PFKI I S YFTFQKYCSEI LWNEFYGLNFTCGS SNVSVTTN 597 



RESULT 15 
AAU96992 

ID AAU96992 standard; protein; 651 AA. 
XX 

AC AAU96992; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant E146Q protein sequence. 



XX 

KW Human; ABCG5 ; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif f erence 146 

FT /note= "Wild-type Glu substituted by Gin" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US02 9859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 12; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant E146Q protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 



XX 

SQ Sequence 651 AA; 



Query Match 19.8%; Score 694; DB 5; Length 651; 

Best Local Similarity 28.7%; Pred. No. 1.3e-61; 

Matches 186; Conservative 125; Mismatches 241; Indels 96; Gaps 16; 

Qy 16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEWDLNYQVDLASQVPWFEQLAQFKMPW 75 

|| : Ml || I : :|::| : Ml : II:: : : I 

Db 8 TPGGSMGLQWRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR-PWWD-ITSCRQQW 64 

Qy 76 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

I : : : : I I I I I : : I : I I I I I : : I I I : : I I I I I : : : : 

Db 65 TRQI LKDVS L YVE S GQIMC I LGS S GS GKTT LLDAMS GRLGRAGT F-LGEVYV 115 

Qy 135 N GQ P S S P Q LVRKC VAHVRQHNQ L L PNLT VRET LAF I AQMRL P RT F S QAQ RD KRVE DVI AE 194 

||: : : I : : I I : I I : I I I I : I I : I : : I : I : I I I : I I 

Db 116 NGRALRREQFQDCFSWLQSDTLLSSLTVRQTLHYTALLAI-RRGNPGSFQKKVEAVMAE 174 

Qy 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 

|| || : I I : I : I I I I I I I I I I I I : I : : : II I I : I I I I I : : I 
Db 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWL 234 

Qy 255 L S RLAKGNRLVL I S LHQ P RS DI FRL FDLVLLMT S GT P I YLGAAQHMVQ YFTAI GYP C P RY 314 

I | | : | | : | : : : : I I I I I : : I : I I I : : : : I I : I I — I Mill: 

Db 235 LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

Qy 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 

I I I I I I : I I I I : I : I : I : I : I : : I : : : : I : : : I 
Db 2 95 SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL 34 8 

Qy 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

I: :| I I II : MM I h - 
Db 349 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 

Qy 422 AEACLMSMTIGFLYFG HGS I QLS FMDTAALLFMI GALI P FNVI LDVI S KC YS ER 475 

: : | : : I Mil I IM |: :|: :: I 

Db 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ D RVGLL YQ FVGAT P YT GMLNAVN L FP VLR 446 

Qy 476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

I : I : I I II I I II M : II I I : I 

Db 447 AVS DQE S QDGL YQKWQMMLAYALHVLP FS WATMI FS S VC YWT LGLH P EVARF 499 

Qy 536 WLWFCCRIMALAAAALLPTFHMASFFS NALYNS FYLAG GFM 577 

:|||| : I : M : Ml I I : 

Db 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 549 

Qy 578 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 

I : : I I : I : M I M M : I : : : : 

Db 550 RNIQEMPI PFKI I S YFTFQKYCSEI LWNEFYGLNFTCGS SNVSVTTN 597 



Search completed: February 27, 2004, 06:44:22 
Job time : 49.0351 sees 



GenCore version 5.1.6 
Copyright (c) 1993-2 004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



February 27, 2004, 07:11:48 ; Search time 15.2492 Seconds 

(without alignments) 
2278.426 Million cell updates/sec 

US-09-989-981A-8 
3506 

1 MAGKAAE E RGL P KGAT PQ DT FMVLYYVSLRFI KQKPSQDW 673 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 389414 seqs, 51625971 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



389414 



Database 



Issued_Patents_AA: * 

1: /cgn2_6/ptodata/2/iaa/5A_COMB.pep: * 

2 : /cgn2_6/ptodata/2/iaa/5B_COMB.pep:* 

3 : /cgn2_6/ptodata/2/iaa/ 6A_COMB . pep : * 

4 : /cgn2_6/ptodata/2/iaa/ 6B_COMB . pep : * 

5: /cgn2_6/ptodata/2/iaa/PCTUSJ20MB.pep:* 

6 : /cgn2_6/ptodata/2/ iaa/backf ilesl .pep: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 
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Query 
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ALIGNMENTS 



RESULT 1 
US-09-245-808-1 

; Sequence 1, Application US/09245808 

; Patent No. 6313277 

; GENERAL INFORMATION: 

; APPLICANT: Doyle, L. Austin 

; APPLICANT: Abruzzo, Lynne V. 

; APPLICANT: Ross, Douglas D . 

; TITLE OF INVENTION: Breast Cancer Resistance Protein (BCRP) and DNA which 

; TITLE OF INVENTION: encodes it 

; FILE REFERENCE: Ross UMb conversion 

; CURRENT APPLICATION NUMBER: US/ 09/245 , 808 

; CURRENT FILING DATE: 1999-02-05 

; EARLIER APPLICATION NUMBER: 60/0737 63 

; EARLIER FILING DATE: 1998-02-05 

; NUMBER OF SEQ ID NOS : 7 

; SOFTWARE : PatentlnVer. 2.0 

; SEQ ID NO 1 



LENGTH: 655 
TYPE: PRT 

ORGANISM: Human MCF-7/AdrVp cells 
US-09-245-808-1 

Query Match 18.3%; Score 640.5; DB 4; Length 655; 

Best Local Similarity 27.2%; Pred. No. 5.1e-62; 

Matches 187; Conservative 139; Mismatches 273; Indels 89; Gaps 21; 

Qy 19 DTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSP 78 

: I : I | : : | | I I : : I : I I I : I 
Db 16 NTNG FPATASNDLKAFTEGA — VLSFHNICYRVKLKSGF LP 54 

Qy 79 SCQNSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQ 137. 

I : 1 I I : : : : I : I I : I : I I : : I I I I I : I : I I : I I I 

Db 55 -CRKPVEKEILSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGA 111 

Qy 138 PSSPQLVRKC-VAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELR 196 

I II : I I : :: I I I I I I I I : I I I : :::: I : I I I I 

Db 112 PRPANF — KCNSGYWQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELG 169 

Qy 197 LRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLS 256 

I : I I : : I I : : I I : I I I I I : I I I I : : I : : I II I I I I I : I I I I II: : : I 
Db 170 LDKVADSKVGTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLK 229 

Qy 257 RLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSN 316 

|::| I :: hllll ||:||| : |: II :: I II : II : II I hi 
Db 230 RMS KQGRT 1 1 FS I HQPRYS I FKLFDS LTLLAS GRLMFHGPAQEALGYFESAGYHCEAYNN 289 

Qy 317 PADFYVDLTS I DRR SREQELATRE — KAQSLAALFLEKVRDL — DDFLWKAETK — 366 

I I I I : : I : : I : I I : : I : : I I : : : : : I I I I 

Db 290 PADFFLDIINGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYK-ETKAE 348 

Qy 367 DLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 419 

: : I : : : I I : : I I : : 

Db 349 LHQLSGGEKKKKITVFKEI SYTTSFC HQLRWVSKRS FKNLLGNPQAS IA 397 

Qy 420 HGAEACLMSMT I GFLYFGHGS I QLS FMDTAALLFMI GALI P FN VI LDVI S KC YS 473 

: : : I I : I I I ■ : : I : I I : : : I : I 

Db 398 QIIVTWLGLVIGAIYFGLKNDSTGIQNRAGVLFFL TTNQC FS S VS AVE 44 6 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGE-LPEHCAYIIIYGMPTYWLANLRPGLQPF 528 

I : : : I II II hi : II I I : I :: I : I I 

Db 447 LFWEKKLFIHEYI SGYYRVSSYFLGKLLSDLLPMTMLPSI I FTCIVYFMLGLKPKADAF 506 

Qy 529 LLHFLLWLWFCCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPA 588 

: : : I : I I I I I I : I : : : | : : I I : : : : : 

Db 507 FVMMFTLMMVAYSASSMALAIAAGQSWSVATLLMTICFVFMMIFSGLLVNLTTIASWLS 566 

Qy 589 WISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT IAVSGDKIL — SAMELDSYP 639 

I: I I: I I :l : : I I I :|:: I ::| : 

Db 567 WLQYFS I PRYGFTALQHNEFLGQNF-CPGLNATGNNPCNYATCTGEEYLVKQGI DLS PWG 625 

Qy 640 LYAIYLIVIGLSGGFMVLYYVSLRFIKQ 667 

|: :: : : |: : |: | |:|: 
Db 62 6 LWKNHVALACMIVI FLTIAYLKLLFLKK 653 



RESULT 2 
US-09-767-594-1 

Sequence 1, Application US/09767594 
Patent No. 6521635 
GENERAL INFORMATION: 
APPLICANT: Bates, Susan 
APPLICANT: Robey, Robert 

APPLICANT: The Government of the United States of America 
APPLICANT: as represented by the Secretary of the 
APPLICANT: Department of Health and Human Services 

TITLE OF INVENTION: Inhibition of MXR Transport by Acridine Derivatives 
FILE REFERENCE: 015280-402100US 
CURRENT APPLICATION NUMBER: US/09/767, 594 
CURRENT FILING DATE: 2001-01-22 
PRIOR APPLICATION NUMBER: US 60/177,410 
PRIOR FILING DATE: 2000-01-20 
NUMBER OF SEQ ID NOS : 2 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 1 
LENGTH: 655 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human mitoxanthrone resistance (MXR) /BRCP/ABCP 
OTHER INFORMATION: protein 
US-09-767-594-1 

Query Match 18.2%; Score 638.5; DB 4; Length 655; 

Best Local Similarity 27.9%; Pred. No. 8.5e-62; 

Matches 175; Conservative 131; Mismatches 254; Indels 67; Gaps 17; 

NSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQP 138 
I I I : : : : I : I I : I : I I : : I I I I I : I : I I : I I I I 



II::: I I I I I I I I : I I I : : : : : I : I I I I I 



I : : I I : : I I : I I I I I : I I I I : : I : : I II I I I I I : I M I I I : : : I I 



I I : : I : I I II I I : I I I : I : I I : :. I I I : I I : I I I I : I I 



::|: : | :||:: I : :||: :: : :| III 



Qy 


80 


Db 


55 


Qy 


139 


Db 


113 


Qy 


198 


Db 


171 


Qy 


258 


Db 


231 


Qy 


318 


Db 


291 


Qy 


367 


Db 


350 



I : : I I : : 
-HQLRWVSKRSFKNLLGNPQASIAQ 398 



QY 



421 GAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYS 473 



: : : I I : I I I : : I : I I : : : I : I 

Db 399 IIVTWLGLVIGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFS SVSAVEL 447 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGE-LPEHCAYIIIYGMPTYWLANLRPGLQPFL 529 

I : : : I II II I : I : II I I : I : : I : I I 

Db 448 FWEKKLFIHEYISGYYRVSSYFLGKLLSDLLPMRMLPSIIFTCIVYFMLGLKPKADAFF 507 

Qy 530 LHFLLWLWFCCRIM7VLAAAALLPTFHMASFFSNALYNSFYIAGGmiNLSSLWTVPAW 589 

: : : I : II I I I I : I : : . : I : : I I : : : : : I 

Db 508 VMMFTLMMVAYSASSMAIAIAAGQSWSVATLL^ 567 

Qy 590 ISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT IAVSGDKIL — SAMELDSYPL 64 0 

: I I : I I : I : : I I I : I : : I : : I : I 

Db 568 LQYFSIPRYGFTALQHNEFLGQNF-CPGLNATGNNPCNYATCTGEEYLVKQGIDLSPWGL 626 

Qy 641 YAIYLIVIGLSGGFMVLYYVSLRFIKQ 667 

: : : : : I : : I : I I : I : 
Db 627 WKNHVALACMI VI FLTIAYLKLLFLKK 653 



RESULT 3 

US-09-614-912-138 

Sequence 138, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB137 8 US NA 
CURRENT APPLICATION NUMBER: US/09/614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,94 6 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS: 2 04 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 138 
LENGTH: 617 
TYPE: PRT 

ORGANISM: Zea mays 



US-09-614-912-138 



Query Match 14.2%; Score 497.5; DB 4; Length 617; 

Best Local Similarity 26.6%; Pred. No. 4.5e-46; 

Matches 179; Conservative 125; Mismatches 246; Indels 123; Gaps 28; 

Qy 44 PNTLEVRDLN YQVDLASQVPWFEQLAQFKMPWT S P S CQNS CELGI QN LSFKV 95 

I : : : | | | | : : : : I : I : : : 

Db 8 PLAMS FDNVN YYVDMPAEMK HQGVQDDRLQLLRE VTGS F 46 

Qy 96 RSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHN 155 

I I : I : : I I I I : : I : I I : I I I I I : I I I I I : : : I : : 

Db 47 RPGVLTALMGVSGAGKTTLMDVLAGRKTGGYIE-GDIRIAGYPKNQATFARISGYCEQND 105 

Qy 156 QLL PNLT VRET LAFI AQMRL P RT FSQAQ RDKRVEDVIAELRLRQCADTRVGNMYV 210 

I : I I I I : I : I : I I I : : : I : : I : : I I II : 

Db 106 IHSPQVTVRESLIYSAFLRLPGKIGDQEITDDIKMQFVDEVMELVELDNLRDALVGLPGI 165 

Qy 211 RGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLH 270 

Ml : I : I : : I I : I : I I I : : I I I I I I I I : I : : : I : I I : : : I 

Db 166 TGLSTEQRKRLT I AVELVANP S 1 1 FMDEPTS GLDARAAAI VMRT VTINTVDTGRTVVCT I H 225 

Qy 271 QPRSDIFRLFD-LVLLMTSGTPIYLGA AQHMVQYFTAI -GYP — CPRYSNPADFYV 322 

II III 111:11 I III : I I I : I I I I I I : I I I I : : 

Db 226 QPS I DI FESFDELLLLKRGGQVI YSGKLGRNSQKMVEYFEAI PGVPKI KDKY-NPATWML 284 

Qy 323 DLTSIDRRSREQEIATREKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTCV ESSVT 379 

: : : I : II : II II MM: I : I 

Db 285 EVSSV AT EVRLKMDFAKYYETSDLYKQNKVLVNQLSQP 322 

Qy 380 PLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGH 438 

I : I ||: : I I : : I : I I : : I : : : : I : : : 

Db 323 EPGTSDLYFPTEYSQSTIGQFKACLWKQWLTYWRSPDYNLVRYSFTLLVALLLGSIFWRI 382 

Qy 439 GS I QL S FMDTAALLFMI GAL I P FNVI LDV- 1 S KC YS ERAMLYYELEDGLYTTG 490 

I : : I I : I I I : : : : : I : I : I I : I I I : I : 

Db 383 GT NME DATT L GMVI GAM — YTAVMF I G I NN C S TVQ P WS I E RT VF Y RE RAAGMY SAM 437 

Qy 491 PYFFAKI LGELP EHCAY-IIIYGMPTY-WLANLRPGLQPFLLHFLLWLWFCCRI 544 

II I::: 1:1 : I :|:| I :: I I I I : : 

Db 438 PYAIAQWIEI PYVFVQTTYYTLIVYAMMSFQWTA VKFFWFFFI S YFS FLYFT Y 491 

Qy 545 MALAAAALLPTFHMASFFSNALYNSFYLAGGFMI NLSSLWTVPAWI SKVSFLRWCFE 601 

: I : : I : I I I : I : : I I I I I : I II II 
Db 492 YGMMAVSISPNHEVASIFAAAFFSLFNLFSGFFIPRPRIPGWWIWYYWICP LAWTVY 548 

Qy 602 GLMKI QFSRRT YKMPLGNL — TIAVSG — DKILS AMELDSYPLYAIYLIVIGL 650 

I I : I : I : I I : I I : : : I | | : | | : : : 

Db 549 GLIVTQY GDLEDLISVPGESEQTISYYVTHHFGYHRDFLPVIAPVLVLFAV 599 

Qy 651 S GGFMVL YYVS LR 663 

I III:: 
Db 600 F — FAFLYAVCIK 610 



RESULT 4 

US-09-614-912-140 



Sequence 140, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 

APPLICANT: Allen, Steve 

APPLICANT: Rafalski, Antoni 

APPLICANT: Orozco, Buddy 

APPLICANT: Miao, Gou-Hau 

APPLICANT: Famodu, Omolayo O. 

APPLICANT: Lee, Jian Ming 

APPLICANT: Sakai, Hajime 

APPLICANT: Weng, Zude 

APPLICANT: Caimi, Perry G 

APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/ 09/ 614 , 912 
CURRENT FILING DATE: 2000-07-12 



PRIOR 


APPLICATION 


NUMBER: 




60/143, 401 


PRIOR 


FILING DATE 


1999- 


07 


-12 


PRIOR 


APPLICATION 


NUMBER: 




60/143,412 


PRIOR 


FILING DATE 


1999- 


07 


-12 


PRIOR 


APPLICATION 


NUMBER: 




60/146, 650 


PRIOR 


FILING DATE 


1999- 


07 


-30 


PRIOR 


APPLICATION 


NUMBER: 




60/170,906 


PRIOR 


FILING DATE 


1999- 


12 


-15 


PRIOR 


APPLICATION 


NUMBER : 




60/172, 959 


PRIOR 


FILING DATE 


: 1999- 


12 


-21 


PRIOR 


APPLICATION 


NUMBER: 




60/172,946 


PRIOR 


FILING DATE 


: 1999- 


12 


-21 



NUMBER OF SEQ ID NOS : 2 04 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 140 
LENGTH: 1296 
TYPE: PRT 

ORGANISM: Oryza sativa 
US-09-614-912-140 

Query Match 13.7%; Score 481.5; DB 4; Length 1296; 

Best Local Similarity 25.5%; Pred. No. le-43; 

Matches 180; Conservative 133; Mismatches 271; Indels 123; Gaps 29 

Qy 4 KAAEERGLPKGATPQDTSGLQDRLFSSESDNS LYFTYSGQPNTLEVRDLNYQV 56 

I II : : : | : | | | I : : III I : : hill 

Db 645 KEMREMRLSARLSNS S SNGV- SRLMSI GSNEAGPRRGMVLPFT PLSMSFDDVNYYV 699 

Qy 57 DLASQVPWFEQLAQFKMPWT S PS CQNSCELGI QNLS FKVRS GQMLAI I GS S GCGRAS LLD 116 

I : : : : : : I : : : : : I : I : : I " I I I : : I : I 

Db 700 DMPAEMK QQGWDDRLQL- LRDVTGS FRPAVLTALMGVS GAGKTTLMD 746 

Qy 117 VI TGRGHGGKI KS GQI WINGQP S S PQLVRKCVAHVRQHNQLLPNLTVRETLAFI AQMRLP 176 

I : I I | | | : | : | : | | : : : : I : : I : I I I I : I : I : I I I 
Db 747 VLAGRKTGGYIE-GDMRISGYPKNQETFARISGYCEQNDIHSPQVTVRESLIYSAFLRLP 805 

Qy 177 RTFSQAQ RDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNP 231 

: : : |::|: : | III : III :|:|::| |:|: II 

Db 806 EKIGDQEITDDIKIQFVDEVMELVELDNLKDALVGLPGITGLSTEQRKRLTIAVELVANP 865 



Qy 232 GILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFD-LVLLMTSGT 290 

I :: I I I I I I I I : I : : : I : I I : :: I I I III I I I : I I I 

Db 866 S 1 1 FMDEPT S GLDARAAAI VMRTVRNTVDTGRT WCT I HQP S I DI FEAFDELLLLKRGGQ 925 

Qy 291 PIYLGA AQHMVQYFTAI-GYP — CPRYSNPADFYVDLTSIDRRSREQELATREKAQ 343 

III : I I : : I I I I I I : I I I | : : : : : I : 
Db 926 VI YSGQLGRNSQKMI EYFEAI PGVPKIKDKY-NPATWMLEVS SV 968 

Qy 344 SLAALFLEKVRDLDDFLWKAETKDLDEDTCV ESSVTPLDTNCLPSPTK-MPGAVQQF 399 

: II II : I I I : I : I I : I I I I =11 

Db 969 AAEVRLNMDFAEYYKTSDLYKQNKVLVNQLSQPEPGTSDLHFPTKYSQSTIGQF 1022 

Qy 400 TTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALI 459 

: :| :| |: : :: :| ::: |: : :| :| j|: 

Db 1023 RACLWKQWLTYWRS PDYNLVRFS FTLFTALLLGTI FWKI GT KMGNAN S L RMVI GAM- 1078 

Qy 460 PFNVILDV-ISKCYS ERAMLYYELEDGLYTTGP YFFAKI LGELP EH CAY- 507 

: : : : | : | : I I : I I I : I : II I : : : I : I II 

Db 1079 -YTAVMFIGINNCATVQPIVSIERTVFYRERAAGMYSAMPYAIAQWMEIPYVFVQTAYY 1137 

Qy 508 -IIIYGMPTY-WLANLRPGLQPFLLHFLLWL^ 565 

: I : I I : : I I I I : : : hi :h h I 

Db 1138 TLIVYAMMSFQWTA AKFFWFFFVS YFS FLYFT YYGMMTVAI S PNHEVAAI FAAA 1191 

Qy 566 LYNS FYLAGGFMI NLS S LWTVPAWI SKVS FLRWCFEGLMKI QFS RRT YKMPLGNL — 620 

h I I II : I h II I h h hi 
Db 1192 FYSLFNLFSRFFIPRPRIPKWWIWYYWLCP LAWTVYGLIVTQY GDLEQ 1239 

Qy 621 TIAVSGDKILSAMELDSY PLYAI YLIVIGLSGGFM 655 

hi I I : I h I h: : II 

Db 1240 IISVPGQ SNQTISYYVTHHFGYHRKFMPWAPVLVLFAVFFAFM 1283 



RESULT 5 

US-09-614-912-144 

Sequence 144, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/ 09/ 614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 



; PRIOR FILING DATE: 1999-07-30 
; PRIOR APPLICATION NUMBER: 60/170,906 
; PRIOR FILING DATE: 1999-12-15 
; PRIOR APPLICATION NUMBER: 60/172,959 
; PRIOR FILING DATE: 1999-12-21 
; PRIOR APPLICATION NUMBER: 60/172,946 
; PRIOR FILING DATE: 1999-12-21 
; NUMBER OF SEQ ID NOS : 204 
; SOFTWARE: Microsoft Office 97 
; SEQ ID NO 144 
; LENGTH: 539 
TYPE: PRT 

ORGANISM: Triticum aestivum 
FEATURE : 

NAME/ KEY: UNSURE 
LOCATION: (272).. (273) 
US-09-614-912-144 



Query Match 12.6%; Score 440.5; DB 4; Length 539; 

Best Local Similarity 25.3%; Pred. No. 8.4e-40; 

Matches 147; Conservative 114; Mismatches 241; Indels 79; Gaps 18; 

Qy 124 GGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQ 183 

I I I : I : I : : I I : : : I : : I : : I : I : I I I : I I I : 
Db 1 GGYI E- GEITVS GYPKKQETFARI S GYCEQNDI HS PHVTI YES LVFSAWLRLPAEVDS ER 59 

Qy 184 RDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGL 243 

| : | : : : : | || I I I I : I : I : : I I : I : I I I : : I I II I I I 

Db 60 RKMFIEEIMDLVELTSLRGALVGLPGVNGLSTEQRKRLTIAVELVANPSIIFMDEPTSGL 119 

Qy 244 DSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFD-LVLLMTSGTPIYLGA AQ 298 

I : I : :: I : I I : :: I I I III Mil: I I I : I : 

Db 120 DARAAAIVMRTVRNTVNTGRTWCTIHQPSIDIFEAFDELFLMKRGGEEIYVGPVGQNSA 179 

Qy 299 HMVQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFL 350 

: : : : I I I II I I I : : : : : I I : : I : I I 

Db 180 NLIEYFEEIEGISKIKDGY N PAT WML E VS S SAQEEM LGIDFA 221 



Qy 351 EKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGA-VQQFTTLIRRQISN 409 

II ::| ::: I I : :: I II: : I I : :| = 

Db 222 EVYR QSELYQRNKELIKELSMPAPGSSDLNFPTQYSRSFVTQCLACLWKQXXS 274 

Qy 410 DFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDT AALLFM IGA 457 

: | : : :: : : I : : : I I I I I : I : : I : 

Db 275 YWRNPSYTAVRLLFTIVIALMFGTMFWDLGSKTRRSQDLFNAMGSMYAAVLYIGVQNSGS 334 

Qy 458 LIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYW 517 

: I I : I I : I I I : I : I I I :: I I : I I I I 

Db 335 VQPVWV ERTVFYRERAAGMYSAFP YAFGQVAI EFP YVLVQALI YGGLVYS 385 

Qy 518 IANLRPGLQPFLLHFLLWLWFCCRIMAIAAAALLPTFHMASFFSNALYNSFYLAGGFM 577 

: : I I : : : : : I I I : | : | : I I I : I I : : 

Db 38 6 MIGFEWTVAKFLWYLFFMYFTMLYFTFYGMMAVGLTPNESIAAIISSAFYNVWNLFSGYL 44 5 



Qy 578 I NLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAME 634 

| I I : II I : I I I : I I : I I I I : I I 

Db 446 IPRPKLPIWWRWYSWICPVA WTLYGLVASQFG— DIQQPLDQ GVPGPQITVAQF 497 



Qy 635 LDSY PLYAIYLIVIGLSGGFMVLY-YVSLRFIKQK 668 

: | | : : : : : : I I : : : I I I I 

Db 498 VT D Y FG FHH D FLWVVAMVHVAFT VL FAFL F S FAI MR FN FQ K 538 



RESULT 6 

US-08-665-259-25 

Sequence 25, Application US/08665259 
Patent No. 6028173 
GENERAL INFORMATION : 

APPLICANT: Landes, Gregory M. 
APPLICANT: Burn, Timothy C. 
APPLICANT: Connors, Timothy D. 
APPLICANT: Dackowski, William R. 
APPLICANT: Van Raay, Terence J. 
APPLICANT: Klinger, Katherine W. 

TITLE OF INVENTION: NOVEL HUMAN CHROMOSOME 16 GENES , 

TITLE OF INVENTION: COMPOSITIONS, METHODS OF MAKING AND USING SAME 
NUMBER OF SEQUENCES: 73 
CORRESPONDENCE ADDRESS : 

ADDRESSEE: GENZYME CORPORATION 
STREET: One Mountain Road 
CITY: Framingham 
STATE: Massachusetts 
COUNTRY: United States of America 
ZIP : 01701 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/665 , 259 
FILING DATE: 17-JUN-1996 
CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 
NAME: Dugan, Deborah A. 
REGISTRATION NUMBER: 37,315 
REFERENCE/ DOCKET NUMBER: IG5-9.1 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (508) 872-8400 
TELEFAX: (508) 872-5415 
INFORMATION FOR SEQ ID NO: 25: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1684 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-665-259-25 

Query Match 7.1%; Score 250; DB 3; Length 1684; 

Best Local Similarity 24.8%; Pred. No. 1.3e-17; 

Matches 124; Conservative 85; Mismatches 176; Indels 116; Gaps 27; 

Qy 82 NSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSP 141 

| : : : | : : I I : : : I : I I : : I : : I I I I : : I : I I 



Db 



523 NKDRAAVRDLNLNLYEGQITVLLGHNGAGKTTTLSMLTGL FPPTSGRAYISGYEISQ 579 



Qy 142 QLV--RKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQ 199 

: I II: I I : I I I I I I I I I I : : I I I : I > : 

Db 580 DMVQIRKSLGLCPQHDILFDNLTVAEHLYFYAQLK GLSRQKCPEEVKQMLHIIG 633 

Qy 200 CADTRVGNMYVRGLSGGERRRVS I GVQLLWNPGI LI LDEPTSGLDSFTAHNLVKTLSRLA 259 

I I | | Ml ||: |: :|| ||:|: : : II 

Db 634 LEDK—WNSRSRFLSGGMRRKLSIGIALIAGSKVLILDEPTSGMDAISRRAIWDLLQR-Q 690 

Qy 260 KGNRLVLI SLH-QPRSDI FRLFDLVLLMTS GTPIYL GAAQHMVQYFTAIG 308 

| :| :::: | :|: I I : :| |: ::| II II I : 

Db 691 KSDRTIVLTTHFMDEADL— LGDRIAIMAKGELQCCGSSLFLKQKYGAGYHM TLVK 744 

Qy 309 YPCPRYSNPADF YVDLTSIDRRSREQELA TREKAQSLAALFL EKVRDL 356 

| : || I : I : : : I M : I I I I : I : : I 

Db 745 EP HCNPEDISQLVHHHVPNATLE-SSAGAELSFILPRESTHRFEGLFAKLEKKQKEL 800 

Qy 357 DDFLWKAETKDLDE DTCVESSVT PLDTNCLPS P 389 

: I : : I I : I I : : I : I : I 

Db 801 GIASFGASITTMEEVFLRVGKLVDSSMDIQAIQLPALQYQHERRASDWAVDSNLCGAMDP 860 

Qy 390 TKMPGAV QQFTTLIRRQISNDFRDLPTLLIHGAEAC — LMSM 429 

: | | : | | | : : : : : I : : I : I : 

Db 861 SDGIGALIEEERTAVKLNTGLALHCQQFWAMFLKKAAYSWREWKMV AAQVLVPLTCV 917 

Qy 430 TIGFLYFGHGSIQLSFMDTAALLFMIG ALI PFNVI-LDVI SKCYSERAMLYYELED 484 

|: I : I I I :| ::||:| : : II hi 
Db 918 TLALLAINYSS ELFDDPMLRLTLGEYGRTWPFSVPGTSQLGQQLSE HLKD 968 

Qy 485 GLYTTG — P YFFAKI LGELPE 503 

I I I ::lhl I 

Db 969 ALQAEGQEP REVLGDLEE 986 



RESULT 7 

US-08-762-500-25 

Sequence 25, Application US/08762500 
Patent No. 6030806 
GENERAL INFORMATION: 

APPLICANT: Landes, Gregory M. 
APPLICANT: Burn, Timothy C. 
APPLICANT: Connors, Timothy D. 
APPLICANT: Dackowski, William R. 
APPLICANT: Van Raay, Terence J. 
APPLICANT: Klinger, Katherine W. 

TITLE OF INVENTION: NOVEL HUMAN CHROMOSOME 16 GENES, 

TITLE OF INVENTION: COMPOSITIONS, METHODS OF MAKING AND USING SAME 
NUMBER OF SEQUENCES: 83 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: GENZYME CORPORATION 
STREET: One Mountain Road 
CITY: Framingham 
STATE: Massachusetts 
COUNTRY: United States of America 
ZIP: 01701 
COMPUTER READABLE FORM: 



MEDIUM TYPE: Floppy disk 
; COMPUTER: IBM PC compatible 

; OPERATING SYSTEM: PC-DOS/MS-DOS 

; SOFTWARE: Patentln Release #1.0, Version #1.30 

; CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/ 08/762 , 500 

FILING DATE: 09-DEC-1996 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 
; APPLICATION NUMBER: US 08/665,259 

; FILING DATE: 17-JUN-1996 

; PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/US96/10469 

FILING DATE: 17-JUN-1996 
; ATTORNEY/AGENT INFORMATION: 

NAME: Dugan, Deborah A. 
; REGISTRATION NUMBER: 37,315 

REFERENCE/ DOCKET NUMBER: IG5-9.3 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: (508) 872-8400 

TELEFAX: (508) 872-5415 
; INFORMATION FOR SEQ ID NO: 25: 

SEQUENCE CHARACTERISTICS: 
; LENGTH: 1684 amino acids 

; TYPE: amino acid 

TOPOLOGY: linear 
; MOLECULE TYPE: protein 
US-08-762-500-25 



Query Match 7.1%; Score 250; DB 3; Length 1684; 

Best Local Similarity 24.8%; Pred. No. 1.3e-17; 

Matches 124; Conservative 85; Mismatches 176; Indels 116; Gaps 27; 

Qy 82 NSCELGIQNLS FKVRSGQMLAI I GS SGCGRASLLDVITGRGHGGKI KSGQIWINGQPS S P 141 

| : : : | : : I I : : : I : I I : : I : : I I I I : : I : I I 

Db 523 NKDRAAVRDLNLNLYEGQITVLLGHNGAGKTTTLSMLTGL FPPTSGRAYISGYEISQ 579 

Qy 142 QLV — RKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVI7VELRLRQ 199 

: I II: I I : I I I I I I I I I I : : II I : I I : 

Db 580 DMVQIRKSLGLCPQHDILFDNLTVAEHLYFYAQLK GL S RQ KC P E EVKQMLH 1 1 G 633 

Qy 200 CADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLA 259 

I I | | | i | I I : : | I I : I : : | | I I I I I I I : I : : : II 

Db 634 LEDK — WNSRSRFLSGGMRRKLSIGIALIAGSKVLILDEPTSGMDAISRRAIWDLLQR-Q 690 

Qy 260 KGNRLVLISLH-QPRSDIFRLFDLVLLMTS GTPIYL GAAQHMVQYFTAIG 308 

| : | : : : : I : I : I I : : I I : : : I I I II I : 

Db 691 KS DRT I VLTTHFMDEADL — LGDRI AIMAKGELQCCGS S LFLKQKYGAG YHM TLVK 74 4 



Qy 309 YPCPRYSNPADF YVDLTSIDRRSREQELA TREKAQSLAALFL EKVRDL 356 

I : I I I : I : : : I I I : II II : I : : I 

Db 745 EP HCNPEDISQLVHHHVPNATLE-SSAGAELSFILPRESTHRFEGLFAKLEKKQKEL 800 

Qy 357 DDFLWKAETKDLDE DTCVESSVT PLDTNCLPS — P 389 

: I ::| |:||: :|:| : I 

Db 801 GIASFGASITTMEEVFLRVGKLVDSSMDIQAIQLPALQYQHERRASDWAVDSNLCGAMDP 860 



Qy 390 TKMPGAV QQFTTLIRRQISNDFRDLPTLLIHGAEAC— LMSM 429 

: | | : | | | : : : : : | : : I : I : 

Db 8 61 SDGIGALIEEERTAVKLNTGLALHCQQFWAMFLKKAAYSWREWKMV AAQVLVPLTCV 917 

Qy 430 TIGFLYFGHGSIQLSFMDTAALLFMIG ALI P FNVI -LDVI S KCYS ERAMLYYELED 484 

I: I : I 11:1 : : II hi 
Db 918 TLALLAINYSS ELFDDPMLRLTLGEYGRTWPFSVPGTSQLGQQLSE HLKD 968 

Qy 485 GLYTTG--PYFFAKILGELPE 503 

I I I : : I I : I I 

Db 969 ALQAEGQEP REVLGDLEE 986 



RESULT 8 

US-08-762-500-75 

Sequence 75, Application US/08762500 
Patent No. 6030806 
GENERAL INFORMATION: 

APPLICANT: Landes, Gregory M. 
APPLICANT: Burn, Timothy C. 
APPLICANT: Connors, Timothy D. 
APPLICANT: Dackowski, William R. 
APPLICANT: Van Raay, Terence J. 
APPLICANT: Klinger, Katherine W. 

TITLE OF INVENTION: NOVEL HUMAN CHROMOSOME 16 GENES, 

TITLE OF INVENTION: COMPOSITIONS, METHODS OF MAKING AND USING SAME 
NUMBER OF SEQUENCES: 83 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: GENZYME CORPORATION 
STREET: One Mountain Road 
CITY: Framingham 
STATE: Massachusetts 
COUNTRY: United States of America 
ZIP: 01701 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 

SOFTWARE: Patentln Release #1.0, Version #1.30 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/762 , 500 
FILING DATE: 09-DEC-1996 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/665,259 
FILING DATE: 17-JUN-1996 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: PCT/US96/104 69 
FILING DATE: 17-JUN-1996 
ATTORNEY/ AGENT INFORMATION: 
NAME: Dugan, Deborah A. 
REGISTRATION NUMBER: 37,315 
REFERENCE/ DOCKET NUMBER: IG5-9.3 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: (508) 872-8400 
TELEFAX: (508) 872-5415 
INFORMATION FOR SEQ ID NO: 75: 



; SEQUENCE CHARACTERISTICS: 
; LENGTH: 17 04 amino acids 

; TYPE: amino acid 

TOPOLOGY: linear 
; MOLECULE TYPE: protein 
US-08-762-500-75 



Query Match 7.1%; Score 250; DB 3; Length 1704; 

Best Local Similarity 24.8%; Pred. No. 1.3e-17; 

Matches 124; Conservative 85; Mismatches 176; Indels 116; Gaps 27; 



Qy 


82 


NS CELGI QNLS FKVRS GQMLAI I GS S GCGRAS LLDVI TGRGHGGKI KS GQI WINGQP S S P 

* i it iii ill lllll 

| : : : | : : I I : : : 1 : 1 1 : : 1 : : 1 1 1 1 : : 1 : 1 1 
NKDRAAVRDLNLNLYEGQITVLLGHNGAGKTTTLSMLTGL FPPTSGRAYISGYEISQ 


141 


Db 


543 


599 


Qy 


142 


QLV — RKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQ 

: 1 I I : 1 1 : 1 1 1 1 1 1 1 1 1 1 : : II 1 : 1 1 : 
DMVQIRKSLGLCPQHDILFDNLTVAEHLYFYAQLK GL S RQKC P EEVKQMLH 1 1 G 


199 


Db 


600 


653 


Qy 


200 


CADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLA 

I I 1 1 1 1 1 1 1 : : 1 1 1 : 1 : : 1 1 1 1 1 1 1 1 1 : I : : * > 1 

LEDK — WNSRSRFLSGGMRRKLSIGIALIAGSKVLILDEPTSGMDAISRRAIWDLLQR-Q 


259 


Db 


654 


710 


Qy 


260 


KGNRLVLI SLH-QPRSDI FRLFDLVLLMTS GTPIYL GAAQHMVQ Y FT AI G 

• ■ ■ iii ■ i i i i i i i 
| : | : : : : | : | : I I : : 1 1 : : : 1 II II 1 : 

KSDRTIVLTTHFMDEADL — LGDRIAIMAKGELQCCGSSLFLKQKYGAGYHM TLVK 


308 


Db 


711 


764 


Qy 


309 


YPCPRYSNPADF YVDLT S I DRRS REQELA TREKAQSLAALFL EKVRDL 

■ ■■■ i i i i it it . i . . i 


356 


Db 


765 


| : 1 1 1 : 1 : : : 1 1 1 : II II : 1 : : 1 

EP HCNPEDI SQLVHHHVPNATLE- S SAGAELS FI LPRESTHRFEGLFAKLEKKQKEL 


820 


Qy 


357 


DDFLWKAETKDLDE DTCVESSVT PLDTNCLPS — P 

: 1 : : 1 1 : 1 1 : : | : | : | 


389 


Db 


821 


GIASFGASITTMEEVFLRVGKLVDSSMDIQAIQLPALQYQHERRASDWAVDSNLCGAMDP 


880 


Qy 


390 


TKMPGAV QQFTTLIRRQISNDFRDLPTLLIHGAEAC — LMSM 


429 


Db 


881 


: | | : | | | : : : : : | : : | : I : 

SDGIGALIEEERTAVKLNTGLALHCQQFWAMFLKKAAYSWREWKMV AAQVLVPLTCV 


937 


Qy 


430 


TIGFLYFGHGSIQLSFMDTAALLFMIG ALI PFNVI-LDVI SKCYSERAMLYYELED 


484 


Db 


938 


| : I : I I 1 : 1 :: 1 1 : 1 : : 1 1 1 : 1 
TLALLAINYSS ELFDDPMLRLTLGEYGRTWPFSVPGTSQLGQQLSE HLKD 


988 


Qy 


485 


GLYTTG — PYFFAKILGELPE 503 




Db 


989 


1 1 1 : : 1 1 : 1 1 
ALQAEGQEP REVLGDLEE 1006 





RESULT 9 

US-09-489-039A-10393 

; Sequence 10393, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et . al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 



; FILE REFERENCE: 2709.2004001 

; CURRENT APPLICATION NUMBER: US/09/489, 039A 

; CURRENT FILING DATE: 2000-01-27 

; PRIOR APPLICATION NUMBER: US 60/117,747 

; PRIOR FILING DATE: 1999-01-29 

; NUMBER OF SEQ ID NOS : 14342 

; SEQ ID NO 10393 

LENGTH: 265 

TYPE: PRT 
; ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-10393 



Query Match 6.9%; Score 243.5; DB 4; Length 265; 

Best Local Similarity 28.8%; Pred. No. 2.6e-18; 

Matches 66; Conservative 49; Mismatches 97; Indels 17; Gaps 4 

Qy 86 LGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVR 145 

I : I I : M : I : : : : I I I I II : : : I I : : I I : I : I : : I I 

Db 25 LALQNVSFDIVEGETISLIGHSGCGKSTLLNLIA--GITTPTEGGLLCDNREIAGPGPER 82 

Qy 146 KCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRV 205 

| | |:: III |: : :| || |:::| : :| :| ::: I 

Db 83 AWFQNH S LL PWL S C FDNVALAVDQVFRRTMS KS ERREWI EHNLARVQMGHALHKRP 139 

Qy 206 GNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLV 265 

I : I I I : : I I I I I : I : I I II II: I :l 1 = : : 

Db 140 GE ISGGMKQRVGIARALAMKPKVLLLDEPFGALDALTRAHLQDTVMHIQQELNTT 194 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRY 314 

: : : : I I I I : I I : I : I | : | | I : 

Db 195 I VM I T H DVD EAVL LSD RVLMMT N G P AAT VG E ILAVDLPRPRH 236 



RESULT 10 

US-09-252-991A-21665 

; Sequence 21665, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

APPLICANT: Marc J. Rubenfield et al . 
; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252, 991A 
; CURRENT FILING DATE: 1999-02-18 
; PRIOR APPLICATION NUMBER: US 60/074,788 
; PRIOR FILING DATE: 1998-02-18 
; PRIOR APPLICATION NUMBER: US 60/094,190 
; PRIOR FILING DATE: 1998-07-27 
; NUMBER OF SEQ ID NOS: 33142 
; SEQ ID NO 21665 
; LENGTH: 593 
TYPE : PRT 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-21665 



Query Match 



6.9%; Score 243; DB 4; Length 593; 



Best Local Similarity 27.5%; Pred. No. 1.2e-17; 

Matches 74; Conservative 61; Mismatches 100; Indels 34; Gaps 9 



Qy 89 QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQ— PSSPQLVR 145 

: : : I I : : : : I I II I : : : II II : : | | : | | : I : I i I : 

Db 2 82 RDIDFAAARGEFVTLLGPSGCGKSTLLRCIAGL TEVDSGRILIDGEDWPLPPQ — K 336 

Qy 146 KCVAHVRQHNQLLPNLTVRETLAF-IAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTR 204 

: : I I I | | | : | | : : : | | : : : I I : : I I : I : I : I 
Db 337 RGIAMVFQSYALFPNMTVQQNVAFGLRMQKVP AAE L KQ RVAEAI E LVE LGE YA 389 

Qy 205 VGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRL 264 

I I I I I : : I I : : I : I : I : I I I I I I I : : I : : I : : I 
Db 390 --ARYPHQLSGGQCQRVALARSLVTTRPRLLLLDEPLSALDARIRKHLREQIRRIQQELGL 447 

Qy 265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFY — V 322 

: : : : I I : : I I : I : I I : : I I : I I 

Db 448 TTVFVTHDQEEALTLSDRIVLMNAGRIVQSGDAETL YTAPENAFAAGFIGNY 499 

Qy 323 DLTSIDRRSR EQELATREKAQSL 345 

:| :: II |::| I :: I 

Db 500 NLLDAEQASRLLGQPCAQQVAIRPESLRL 528 



RESULT 11 

US-09-252-991A-27569 

; Sequence 27569, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

APPLICANT: Marc J. Rubenfield et al . 
; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252, 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 

; SEQ ID NO 27569 

LENGTH: 330 

TYPE: PRT 
; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-27569 

Query Match 6.9%; Score 242; DB 4; Length 330; 

Best Local Similarity 27.5%; Pred. No. 5.7e-18; 

Matches 85; Conservative 60; Mismatches 120; Indels 44; Gaps 11 

Qy 88 IQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVR-K 146 

: I : : : : I I : : : I : : I I I I I : : I I : I I : I I : I : I I I : 

Db 19 LDNINLDIQSGELVALLGPSGCGKTTLLRIIAGL ETPDAGNIVFHGEDVSQHDVRDR 75 

Qy 147 CVAHVRQHNQLL PNLTVRET LAFI AQMRL P R — TFSQAQRDKRVEDVIAELRLRQCADTR 204 

I Ml I : : I I : : I I : I : I : : : : | : : : : : I II 



Db 



7 6 NVGFVFQHYALFRHMTVFDNVAFGLRMK-PKGERPGESAIKAKVHELL^^y[VQLDWLAD — 132 



Qy 205 VGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRL 264 

I | | M : | : | : : : I I I I : I I I I I I : I : I : I I : I 

Db 133 RYPEQLSGGQRQRIAIARALAVEPKILLLDEPFGALDAKVRKELRRWL7VRLHEEINL 18 9 

Qy 2 65 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPA-DF-YV 322 

: : : : : I ::: I I III I I I I I I I 

Db 190 TSVFVTHDQEEAMEVADRIWMNKGV 1 EQ I GS P GEVYEN PAS D FVYH 236 

Qy 323 DLTSIDR RSREQELATREKAQSLAALFLEKVRDLDDF — LWKAETKDLD 369 

I : I I I I : I I : I I : I I I : : : I 

Db 237 FLGDSNRLQLGNDQHLLFRPHEVSLSRSEVAEHRAA EVRDIRPLGAITRVTLKVDG 292 

Qy 370 EDTCVESSV 378 

: I : I : I 
Db 293 QDELIEAEV 301 



RESULT 12 

US-09-4 8 9-039A-11991 

; Sequence 11991, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et . al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 

FILE REFERENCE: 2709.2004001 

CURRENT APPLICATION NUMBER: US/09/489, 039A 

CURRENT FILING DATE: 2000-01-27 
; PRIOR APPLICATION NUMBER: US 60/117,747 
; PRIOR FILING DATE: 1999-01-29 
; NUMBER OF SEQ ID NOS : 14342 
; SEQ ID NO 11991 
LENGTH: 379 
TYPE : PRT 
; ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-11991 



Query Match 6.8%; Score 238; DB 4; Length 379; 

Best Local Similarity 28.0%; Pred. No. 2e-17; 

Matches 69; Conservative 48; Mismatches 89; Indels 40; Gaps 9; 

Qy 88 IQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRG— HGGKIKSGQIWING-QPSSPQLV 144 

: : I : I : : I : I I I I : : : I : : I I I : : : I : I II 
Db 24 VHGIDLKIADGEFMVIVGPSGCAKSTTLRMLAGLETISGGEVRIGDKIVNNLAPKS 79 



Qy 145 RKCVAHVRQHNQLLPNLTVRETLAF-IAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADT 203 

: : I I I : I I :: I I I I I I I : : I I : I I I : : I I : I I : I 

Db 80 - RGI AMVFQN YAL YPHMTVRENLAFGLKLS KLPK AQI DRQVEEAAKI LELEELLD- 133 

Qy 204 RVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNR 263 

I I II I : : I I : : I : : I : : I I I I I I : : I I 

Db 134 RLPRQLSGGQAQRVAVGRAIVKKPDVFLFDEPLSNLD AKLRASMR 178 

Qy 264 LVLISLH QPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSN 316 



Db 



; . |[ • I • I ill i • i • • I \ i i 

179 I RI SDLHKQLKKSGKPATTVYVTHDQTEAMTMGDRI CVMKLGHIMQVDT PDNLYHQ 234 



Qy 



Db 



317 PADFYV 322 

I : : I 
235 PKNMFV 240 



RESULT 13 

US-09-489-039A-8815 

; Sequence 8815, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et. al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 2709.2004001 

; CURRENT APPLICATION NUMBER: US/09/489, 039A 

; CURRENT FILING DATE: 2000-01-27 

; PRIOR APPLICATION NUMBER: US 60/117,747 

; PRIOR FILING DATE: 1999-01-29 

; NUMBER OF SEQ ID NOS : 14342 

; SEQ ID NO 8815 

; LENGTH: 38 8 

; TYPE: PRT 

; ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-8 815 

Query Match 6.6%; Score 231; DB 4; Length 388; 

Best Local Similarity 27.1%; Pred. No. 1.3e-16; 

Matches 69; Conservative 53; Mismatches 87; Indels 46; Gaps 9 

Qy 8 6 LGIQN LS FKVRSGQMLAI I GS SGCGRASLLDVITGRGHGGKI KSGQI 132 

| : I I || : I : : : : I I I I I : : : I I : : I : III 

Db 34 LSLQNISKRFDGKPALSALSLDIHEGEFWLVGPSGCGKSTLLRLLAGL EPVSEGQI 90 

Qy 133 WINGQ PSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVE 18 9 



Db 



I . . . » • i • • i • i • i i • i • i i * * i • • i • * i i * 

91 WLHNENITAATPR — ERN FAMI FQN YAL F PHL S VRDN I T FGMKVRKE EKSSWQPRVD 145 



Qy 



190 DVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAH 249 



Db 




250 NLVKTLSRLAKGNRLVLISLHQ — PRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI 307 
: I I | : : : | | | I I : I I : I : I : I : 




Db 



Qy 



308 GYPCPRYSNPADFYV 322 



Db 



24 6 GRPEYLYANPANLFV 2 60 



RESULT 14 

US-09-252-991A-2 0719 

; Sequence 20719, Application US/09252991A 



; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al. 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252 , 991A 
; CURRENT FILING DATE: 1999-02-18 
; PRIOR APPLICATION NUMBER: US 60/074,788 
; PRIOR FILING DATE: 1998-02-18 

PRIOR APPLICATION NUMBER: US 60/094,190 

PRIOR FILING DATE: 1998-07-27 
; NUMBER OF SEQ ID NOS : 33142 
; SEQ ID NO 20719 
LENGTH: 37 0 
TYPE: PRT 

; ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-20719 

Query Match 6.6%; Score 230; DB 4; Length 370; 

Best Local Similarity 27.4%; Pred. No. 1.5e-16; 

Matches 83; Conservative 55; Mismatches 121; Indels 44; Gaps 10; 

Qy 8 8 IQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKC 147 

: I : I : : I : : : I I I II : : I I : : I : I I : I : I I I : I 

Db 28 VDNVSLTINTGEFFTLLGPSGCGKTTLLRMLAG FDQPDSGEIRLNGQDLAGVEPEKR 84 

Qy 148 VAH-VRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVG 206 

III I |:::| : :|| :| :■: : : I III : ::|| II I 
Db 85 PVHTVFQSYALFPHMSVAQNIAFPLKM AGVAK S E I DARVEQAL KDVRL ADK — G 136 

Qy 207 NM YVRGL S GGE RRRVS I GVQ LLWN PGILILDEPTS GL D S FT AHN LVKT L S RLAKGN RLVL 266 

I I I I : I : I I : I I : I : I : I I I I I II : : I II : 

Db 137 GRMPTQLSGGQRQRVAIARALVNRPRLLLLDEPLSALDAKLREEMQIELINLQKDVGITF 196 

Qy 267 I SLHQPRS DI FRLFDLVLLMTSGTPI YLGAAQHMVQYFTAI GYPCPRYSNPADF Y 321 

: : : : I : : I I II:: III III 

Db 197 VYWHDQGEALALSHRIAVMNQGRVEQLDAPETI YSFPRSRFVADFIGQCNL 248 

Qy 322 VDLT — S I DRRSREQELATREKAQSLAA LFLEKVRDLDDFLWKAETKD 367 

: I I : : I :| : |:| : I I I : I I :: I I 

Db 24 9 LDATVEAVDGERVRI DLRGLGEVQALKS FDAQPGEACVLTLRPEKI R LAQSVTAD 303 

Qy 368 LDE 370 

I I 

Db 304 SDE 306 



RESULT 15 

US-09-134-000C-3584 

; Sequence 3584, Application US/09134000C 
; Patent No. 6617156 
; GENERAL INFORMATION: 

; APPLICANT: Lynn Doucette-Stamm et al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 

; TITLE OF INVENTION: ENTEROCOCCUS FAECAL IS FOR DIAGNOSTICS AND THERAPEUTICS 



; FILE REFERENCE: 032796-032 

; CURRENT APPLICATION NUMBER: US/ 09/134 , 000C 

; CURRENT FILING DATE: 1998-08-13 

; PRIOR APPLICATION NUMBER: US 60/055,77 8 

; PRIOR FILING DATE: 1997-08-15 

; NUMBER OF SEQ ID NOS: 6812 

; SOFTWARE: Patentln version 3.1 

; SEQ ID NO 3584 

LENGTH: 229 

TYPE: PRT 
; ORGANISM: Enterococcus faecalis 
US-09-134-O00C-3584 

Query Match 6.5%; Score 229; DB 4; Length 229; 

Best Local Similarity 27.8%; Pred. No. 8.5e-17; 

Matches 70; Conservative 56; Mismatches 88; Indels 38; Gaps 8; 

Qy 47 L E VRD LN YQVD LAS Q VP W FEQ LAQ FKMP WT S P S CQN S C E L G I QN L S FKVR S GQMLAI I G S 106 

INN: : : I : : : : : : : I I I I : I : : I : I I 

Db 3 LEVRDM ANVL EMKN I YKK YG EKHT EVI AL KE L S FAVQ P GE FVAVI G P 4 9 

Qy 107 SGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRK CVAHVRQHNQLL 158 

I I I : : : I : I I I : : : I I I : I : I : : I : I : 

Db 50 SGSGKSTFLTIAAGL QAPTSGEVI VGGQ-SLNKLTKKQRLAQRFQKI GFI LQS SNLV 105 

Qy 159 PNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGER 218 



Db 



106 PFLTVEDQFHLIEKVDKSRKNSELK- 





155 



QY 



219 RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAK-GNRLVLISLHQPRSDIF 



277 



Db 



• Mil I I ... MM* I I • I • - I \ I • • 1 I I I 

156 QRVSIACALYHEPDVILADEPTASLDTEKAFDWKLLAKEAKEKDKGIIMVTHDER— LL 



213 



278 RLFDLVLLMTSG 289 



Db 



i i * * i 

214 KYCDRWRI RDG 225 



Search completed: February 27, 2004, 07:20:17 
Job time : 16.2492 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein' search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence: 

Scoring table: 



February 27, 2004, 06:44:33 ; Search time 14.9951 Seconds 

(without alignments) 
4317.206 Million cell updates/sec 

US-09-989-981A-8 
3506 

1 MAGKAAEERGLPKGATPQDT FMVL YYVS LRFI KQKP SQDW 673 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 283366 seqs, 96191526 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



283366 



Database 



PIR_78:* 
pirl : * 
pir2:* 
pir3 : * 
pir4 : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result 
No. 


Score 


Query 
Match 


Length 


DB 


ID 
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ALIGNMENTS 



RESULT 1 
C84423 

probable ABC transporter [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Feb-2001 #sequence_revision 02-Feb-2001 #text_change 02-Feb-2001 
C; Access ion: C84423 

R;Lin, X.; Kaul, S.; Rounsley, S.D.; Shea, T.P.; Benito, M.I.; Town, CD.; 
Fujii, C.Y.; Mason, T.M.; Bowman, C.L.; Barnstead, M.E.; Feldblyum, T.V.; Buell 
C.R.; Ketchum, K.A. ; Lee, J.J.; Ronning, CM.; Koo, H . ; Moffat, K.S.; Cronin, 
L.A. ; Shen, M. ; VanAken, S.E.; Umayam, L.; Tallon, L.J.; Gill, J.E.; Adams, 
M.D.; Carrera, A.J. ; Creasy, T.H.; Goodman, H.M. ; Somerville, C.R.; Copenhaver, 
G.P.; Preuss, D.; Nierman, W.C; White, O. ; Eisen, J. A. ; Salzberg, S.L.; Fraser 
CM. ; Venter, J.C 
Nature 402, 761-768, 1999 

A; Title: Sequence and analysis of chromosome 2 of the plant Arabidopsis 
thaliana. 

A;Reference number: A84420; MUID : 20083487 ; PMID: 10617197 
A; Accession: C84423 
A; Status: preliminary 
A;Molecule type: DNA 



A; Residues : 1-725 <STO> 

A; Cross-references: GB:AE002093; NID: g4262239; PIDN : AAD14532 . 1 ; GSPDB : GN00139 

C; Genetics : 

A; Gene: At2g01320 

A; Map position: 2 

Query Match 21.0%; Score 735.5; DB 2; Length 725; 

Best Local Similarity 30.0%; Pred. No. 3.6e-50; 

Matches 186; Conservative 123; Mismatches 229; Indels 81; Gaps 15; 

Qy 75 WTSPSC QNS CELGIQNLS FKVRSGQMLAI I GS S GCGRAS LLDVI TG RG 122 

I : : I I : : | : | : : | : : | | | : | | | | : : | | : | : | I 

Db 72 WRNITCSLSDKSSKSVRFLLKNVSGEAKPGRLLAIMGPSGSGKTTLLNVLAGQLSLSPRL 131 

Qy 123 HGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQA 182 

I | | : : | | : | M : : : I I I I : I I I I I I I : I I : : : I I I 

Db 132 H LSGLLEVNGKPSSSKAYK — LAFVRQEDLFFSQLTVRETLSFAAELQLPEISSAE 185 

Qy 183 QRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSG 242 

: II : I : : : : I I I I I : I I : I I I : I I I I : : I : I : : I : : I : : I I I I : I 
Db 186 ERDEYWNLLLKLGLVSC7VDSCVGDAKVRGISGGEKKRLSLACELIASPSVIFADEPTTG 245 

Qy 243 LDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLG-AAQHMV 301 

11:11 : : : I I : I I : I : I : I I II : : I I : : I : I I I : I I I : : 

Db 24 6 LDAFQ7VEKVMETLQKLAQDGHTVICSIHQPRGSVYAKFDDIVLLTEGTLVYAGPAGKEPL 305 

Qy 302 QYFTAI GYPCPRYSNPADFWDLTSIDRRSREQELATREKAQSLAALFLEKWDLDDFLW 361 

II I : I I : I I I : I I I I : I II : : : : : : I I : : 

Db 306 TYFGNFGFLCPEHVNPAEFLADLISVDYSSSETVYSSQKRVHALVDAFSQR 356 

Qy 362 KAETKDLDEDTCVESSV TPLDTNCLPSPTK MPGAVQQFTTLIRR 405 

III III : II I : I I I : : I 

Db 357 SSSVLYATPLS MKEETKNGMRPRRKAIVERTDGWWRQFFLLLKR 400 

Qy 406 QISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVIL 465 

MM: : : : | : : : | | | | I I : I : I : 
Db 401 AWMQASRDGPTNKVRARMSVASAVIFGSVFWRMGKSQTSIQDRMGLL-QVAAI NTAM 456 

Qy 466 DVISKCY SERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANL 521 

: : I III:: I I t : III : I : I : I I : : : : I I : I I 

Db 457 AALTKTVGVFPKERAIVDRERSKGSYSLGPYLLSKTIAEIPIGAAFPLMFGAVLYPM7VRL 516 

Qy 522 RPGLQPFLLHFLLWLWFCCRIMTVLAAAALLPTFHMASFFSNALYNSFYLAGGmiNLS 581 

III : I : I II I : : I : I : I I : I I : : I 

Db 517 N PTLS RFGKFCGI WVES FAAS AMGLTVGAMVP STEAAMAVGPS LMTVFI VFGGY YVNAD 576 

Qy 582 SLWTVPAWISKVS FLRWCFEGLMKIQFS RRTYKMPLGNLT 1 AVSGDKI LSA 632 

: : I I : I : II I : I I Ml : I : : I : : I ' I 

Db 577 NTPIIFRWIPRASLIRWAFQGLCINEFSGLKFDHQNTFDVQTGEQALERLSFGGRRIRET 636 

Qy 633 MELDSYPLY AIYLIV 647 

: II MM: 
Db 637 IAAQSRILMFWYSATYLLL 655 



RESULT 2 
C86441 



probable ABC transporter [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 31-Mar-2001 
C;Accession: C86441 

R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A. ; Kaul, S.; White, 0. 
Alonso, J.; Altaf, H.; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E. ; 
Chan, A.; Chao, Q. ; Chen, H. ; Cheuk, R.F.; Chin, C.W.; Chung, M.K.; Conn, L-; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J.; Fong, B.; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 

A;Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C; Khan, S.; Khaykin, E. 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B.; Langin 
Hooper, S.; Lee, A.; Lee, J.M.; Lenz, C.A. ; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C.; Osborne, B.I.; Pai, G.; Peterson, J.; Pham, P.K. 
Rizzo, M. ; Rooney, T.; Rowley, D. ; Sakano, H. 

A; Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H.; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD.; Utterback, T.; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D. ; Yu, G. ; Eraser, CM.; 
Venter, J.C; Davis, R.W. 

A;Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 

A; Reference number: A86141; MUID: 21016719; PMID : 11130712 

A; Accession: C8 6441 

A; Status: preliminary 

A; Molecule type: DNA 

A; Residues: 1-646 <STO> 

A; Cross-references: GB:AE005172; NID: glll36734 ; PIDN: AAG31315 . 1; GSPDB : GN00141 
C; Genetics : 
A;Map position: 1 

C; Superf amily : Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 20.6%; Score 723.5; DB 2; Length 646; 

Best Local Similarity 30.6%; Pred. No. 2.7e-49; 

Matches 208; Conservative 119; Mismatches 262; Indels 91; Gaps 20; 
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III: : : I : I I I : I I : :: : : | : | : I I : I 

GLPD-MSDTQSKSVLAFPTITSQPGLQMSMYPITLKEWYKVKI EQTSQCMGS 71 



II : : : : I I : I I : : I I I I : : I I : I I I I I : : 

WKSKE KTILNGITGMVCPGEFLAMLGPSGSGKTTLLSALGGR — LSKTFSGKVMY 124 



I I I I I : I : I I : I I : I I I I I I I I : I I I :::::: I : I I I I 



I I : I :: : I I I : I I I I : : I I I I I :: I II : I : I I I I I I I I I I III : I I 



I I I I I : : : I I I I I : : I I I : I : : I : I I ! I I |:|| 



Qy 315 SNPADFYVDLTS IDRRSREQELATREKAQSLAALFLEKVRDLDDFLWKAETKDLD 369 

Mil :|| : : : | | | | : ::| : : : : || | 

Db 304 VNPADLLLDLANGI PPDTQKETSEQEQKTVK — ETLVSAYEKNI STK-LK 350 

Qy 37 0 EDTCVESS VTPLDTNCLPSPTKMPGAVQQFTTLIRRQI-SNDFRDLPTLLIHGAEA 424 

: I I I II | | | | :: | : I II 
Db 351 AELCNAESHSYEYTKAAAKNLKSEQWCTTWWYQFTVLLQRGVRERRFESFNKLRIF Q 407 

Qy 425 CLMSMT I GFLYFGHGS IQLS FMDTAALLFMI GALI P FNVT LDVI S KCYSERAMLYYELED 484 

: : I I : I : : I I I I I I : : : I : I I I 

Db 408 VISVAFLGGLLWWH-TPKSHIQDRTALLFFFSVFWGFYPLYNAVFTFPQEKRMLIKERSS 466 

Qy 485 GLYTTGPYFFAKILGELPEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLVWLWFCCRI 544 

|:| I I I : : I : I I I : I I : hi I : I I : I I : 

Db 467 GMYRLSSYFMARNVGDLPLELALPTAFVFIIYWMGGLKPDPTTFILSLLVVLYSVLVAQG 526 

Qy 545 MALAAAALLPTFHMASFFSNALYNSFY1AGGFMINLSSLWTVP AWISKVSFLRWCFE 601 

: I I III I : : : I : I I I : : : I I : : I : : I : : 

Db 527 LGLAFGALLMNIKQATTLASVTTLVFLIAGGYYVQ QIPPFIVWLKYLSYSYYCYK 581 

Qy 602 GLMKIQFSRRTY KMPLGNLTIAVSGDKILSAMELDSYPLYAI 643 

I : I I : : I I I I I I I : : I I I I 

Db 582 LLLGIQYTDDDYYECSKGVWCRVGDFPAIKSMGLNNLWI DVFVMGVMLVGYRLMA- 636 

Qy 644 YLI VI GLS GGFMVLYYVS LR 663 

: I I : I I I 

Db 637 YMALHRVKLR 646 



RESULT 3 
E96742 

probable ABC transporter F17M19.11 [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 23-Mar-2001 
C;Accession: E96742 

R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A.; Kaul, S.; White, O. 
Alonso, J.; Altaf, H.; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E.; 
Chan, A.; Chao, Q. ; Chen, H. ; Cheuk, R.F.; Chin, C.W.; Chung, M.K.; Conn, L.; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J.; Fong, B.; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 

A;Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C; Khan, S.; Khaykin, E. 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B.; Langin 
Hooper, S.; Lee, A.; Lee, J.M- ; Lenz, C.A.; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C; Osborne, B.I.; Pai, G. ; Peterson, J.; Pham, P.K. 
Rizzo, M. ; Rooney, T. ; Rowley, D.; Sakano, H. 

A; Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H.; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD.; Utterback, T . ; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D . ; Yu, G. ; Fraser, CM.; 
Venter, J.C; Davis, R.W. 

A; Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 
A; Reference number: A86141; MUID: 21016719; PMID: 11130712 
A; Accession: E96742 
A; Status: preliminary 
A;Molecule type: DNA 



A; Residues : 1-609 <STO> 

A; Cross-references: GB:AE005173; NID:g6978921; PIDN: AAF34313 . 1; GSPDB: GN00141 

C; Genetics : 

A; Gene: F17M19.11 

A;Map position: 1 

C; Superfamily : fruit fly white protein; ATP-binding cassette homology 

Query Match 20.0%; Score 700; DB 2; Length 609; 

Best Local Similarity 31.8%; Pred. No. 1.8e-47; 

Matches 210; Conservative 103; Mismatches 225; Indels 122; Gaps 22; 

Qy 76 TSPSCQNSCELGI-QNLSFKVRS GQMLAIIGSSGCGRASLLDVI 118 

: : I I Ihl I : I I I : : I : : I I I I : : : I I : : 

Db 2 SNDSCNIKKLLGLKQKPSDETRSTEERTILSGVTGMISPGEFMAVLGPSGSGKSTLLNAV 61 

Qy 119 TGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRT 17 8 

I I I I : : I : I I I : I : : : I I : I I : I I I I I I I I : I : I I I I : 

Db 62 AGRLHGSNL-TGKILINDGKITKQTLKR-TGFVAQDDLLYPHLTVRETLVFVALLRLPRS 119 

Qy 179 FSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDE 238 

: : : : I I I : I I I : I : I I I I : : I I : I I I I I : I I I I : I I I I : I : I I I 
Db 120 LTRDVKLRAAESVISELGLTKCENTWGNTFIRGISGGERKRVSIAHELLINPSLLVLDE 179 



Qy 239 PTSGLDSFTAHNLVKTLSRLAKG-NRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAA 297 

MINI: I I I : I I : I I I : I : I : I I I I : I : : I I I I I : : I : : : I 
Db 18 0 PT SGLDATAALRLVQTLAGLAHGKGKT WT S I HQP S S RVFQMFDTVLLLS EGKCLFVGKG 239 

Qy 298 QHMVQYFTAI GYPCPRYSNPADFYVDLTSI DRRSREQELATREK AQSLAALFLEKVR 354 

: : I I : : I : I I I I I : I I : : : III I : I : 
Db 24 0 RDAMAYFESVGFSPAFPMNPADFLLDLA — NGVCQTDGVTEREKPNVRQTLVTAY 2 92 



Qy 355 DLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLI 4 03 

II : I I I : I I I I I I : : I III 

Db 293 DTLLAPQVK TCIEVSHFPQD-NARFVKTRVNGG — GITTCIATWFSQLCILL 341 

Qy 404 RRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMI — 455 

I I I I I : : | : | ::: | : I I I I I 

Db 342 HRLLKERRHESFD LLRI FQVVAASILCGLMWW-HSDYR-DVHDRLGLLFFI SI 392 



Qy 456 — GALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGM 513 

MM: Ml: I I : I I I I I : M I : 

Db 393 FWGVL P S FN AVFT F PQERAI FTRERASGMYTLSSYFMAHVLGSLSMELVLPAS FLT 44 8 



Qy 514 PTYWLANLRPGLQPFLLHFLLWLWFCCRIMAL7WUVLLPTFHMASFFSNALYNSFYIA 573 

III: I I I I : II I I : : I I : : I I I : II Ml 

Db 449 FTYWMVYLRPGIVPFLLTLSVLLLYVLASQGLGLALGAAIMDAKKASTIWW 508 

Qy 574 GGFMINLSSLWTVPA WI S KVS FLRWCFEGLMKI QFS RRT YKMPLGNLT I AVS GDKI L 630 

I I : : I II: MM : I : I : I I : I I : : M 

Db 509 GGYYVN KVP S GMVWMKYVS TT F YC YRLLVAI Q YG SGEEIL 54 8 

Qy 631 SAMELDSYPLYA IYLIVIGLSG GFMVLYYVSLRFIK 666 

: I I : I I I I I : I I I : M I I I 

Db 549 RMLGCDSKGKQGASAATSAGCRFVEEEVIGDVGMWTSVGVLFLMFFGYRVLAYLALRRIK 608 



RESULT 4 



T46101 

ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T25B15.80 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 04-Feb-2000 #sequence_revision 04-Feb-2000 #text_change 04-Feb-2000 
C; Accession: T46101 

R;Alcaraz, J. P.; Clabault, G. ; Cottet, A.; Mache, R. ; Mewes f H.W.; Lemcke, K. ; 

Mayer, K.F.X.; Quetier, F. ; Salanoubat, M. 

submitted to the Protein Sequence Database, January 2000 

A;Reference number: Z23021 

A; Accession: T46101 

A; Status: preliminary 

A; Molecule type: DNA 

A; Residues: 1-737 <ALC> 

A;Cross-references: EMBL : AL132972 

A; Experimental source: cultivar Columbia; BAC clone T25B15 
C; Genetics : 
A;Map position: 3 

A;Introns: 122/1; 146/3; 225/2; 277/2; 338/3; 422/2; 535/1; 628/3; 664/3 
A;Note: T25B15.80 

Query Match 19.1%; Score 668.5; DB 2; Length 737; 

Best Local Similarity 28.0%; Pred. No. 7.7e-45; 

Matches 189; Conservative 134; Mismatches 258; Indels 93; Gaps 16; 

Qy 25 DRLFSSESDNSLYFTYSGQPN TLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSPSCQ 81 

I I I : I : : I | : I : I : I I : 

Db 120 DILEDIEAATSSWKFQAEPTFPIYLKFIDITYKV TTKGMT 160 

Qy 82 NSCELGIQN-LSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSS 14 0 

: I I I I : I | : : M : : | | | | : : | | : : I | : I I : I : I I 

Db 161 SSSEKSILNGISGSAYPGELLALMGPSGSGKTTLLNALGGRFNQQNI-GGSVSYNDKPYS 219 

Qy 141 PQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQC 200 

I : : | | : | | : | | | : | | | : | : | | | : | :::::: | I I I I I : I 
Db 220 KHLKTR-IGFVTQDDVLFPHLTVKETLTYTALLRLPKTLTEQEKEQRAASVIQELGLERC 27 8 

Qy 201 ADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAK 260 

I I : I : I I I : I I I I I : I I I I ::: I I : I : I I I I I I I I I I I : I : I : I I 
Db 279 QDTMIGGSFVRGVSGGERKRVCIGNEIMTNPSLLLLDEPTSSLDSTTALKIVQMLHCIAK 338 

Qy 261 GNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPADF 32 0 

: : : : : I I I I : I I I : : : : : I : : I I I : I I : : I I I I I : I 

Db 339 AGKTIVTTIHQPSSRLFHRFDKLWLSRGSLLYFGKASEAMSYFSSIGCSPLLAMNPAEF 398 

Qy 321 WDLTSIDRRSREQELATREKAQSL-AALFLEKVRDLDDFLWKAETKDLDEDTCVESSVT 37 9 

: I I : : I : I I : : I :: I : I I : I : I :: : 

Db 399 LLDLVNGNMNDISVPSALKEKMKIIRLELYVRNVK CDVETQYLEE — AYKTQIA 450 

Qy 380 PLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSM 429 

: : I : i : I : I : I : I : I I : I : 

Db 451 VMEKMKLMAPVPLDEEVKLMITCPKREWG LSWWEQYCLLSLRGIKERRHDY 501 

Qy 430 TIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSER 475 

: I I :: I : I I I I I : I II 

Db 502 FSWLRVTQVLSTAIILGLLWW-QSDITSQRPTRSGLLFFIAVFWGFFPVFTAI FTFPQER 560 



Qy 476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

III II : I III: : I I : : : : I : : I I I : I I II 

Db 561 AMLSKERESN1WRLSAYFVARTTSDLPLDLILPVLFLVVWFMAGLRLRAESFFLSVLTV 620 

Qy 536 WLWFCCRIMALAAAALLPTFHMASFFSNALYNSFYIAGGFMINLSSLWTVP AWISK 592 

: I : : : I I II I : :: : I I I I I : : II III 

Db 621 FLCIVAAQGLGIAIGASmDLKKATTLASVTVMTFMLAGGYFVK KVPFFIAWIRF 675 

Qy 593 VSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSG 652 

: I I : : | : | : | : : : I : | : : | | : : : I : : : I 
Db 676 MS FN YHT YKLLVKVQ YE EIMESVNGEEIESGLK EVSALVAMII 718 

Qy 653 GFMVLYYVSLRFIK 666 

I : : : I I I I : I 
Db 719 GYRLVAYFS LRRMK 732 



RESULT 5 
FYFFW 

white protein - fruit fly (Drosophila melanogaster ) 
C; Species: Drosophila melanogaster 

C;Date: 31-Dec-1990 #sequence_revision 17-Feb-1995 #text_change 19-Jan-2001 
C;Accession: S08635; S07263; S10240 
R;Pepling, M. ; Mount, S.M. 
Nucleic Acids Res. 18, 1633, 1990 

A; Title: Sequence of a cDNA from the Drosophila melanogaster white gene. 
A; Reference number: S08635; MUID : 90221897 ; PMID: 2109311 
A;Accession: S08635 
A;Molecule type: mRNA 
A; Residues: 1-687 <PEP> 

A; Cross-references: EMBL:X51749; NID:g8825; PIDN : CAA36038 . 1 ; PID:g8826 
R;0'Hare, K. ; Murphy, C. ; Levis, R. ; Rubin, G.M. 
J. Mol. Biol. 180, 437-455, 1984 

A; Title: DNA sequence of the white locus of Drosophila melanogaster. 
A;Reference number: S07263; MUID : 85134865 ; PMID:6084717 
A;Accession: S07263 
A;Molecule type: DNA 

A; Residues : 1-24 , 1 LIFEIPYHCRVTAD ' , 30- 

334, 1 ITLHLNSYPAWVPSVLPTTIRRTFTYRCWPLCPDGRSSPVIGSPRYG ' ,372-687 <OHAl> 

A;Cross-references: EMBL:X02974 

A; Experimental source: strain Canton S 

R;0'Hare, K. 

submitted to the EMBL Data Library, June 1985 
A; Reference number: S10240 
A; Accession: S10240 
A; Molecule type: DNA 

A;Residues: 1-24, 1 LIFEIPYHCRVTAD 1 , 30-687 <OHA2> 

A; Cross-references: EMBL:X02974; NID:gl0873; PIDN: CAA26716. 1; PID:gl0874 

A; Experimental source: strain Canton S 

C;Genetics : 

A; Gene: white; w 

A; Cross-references : FlyBase : FBgn0003996 
A;Introns: 24/3; 116/1; 334/2; 439/3; 483/3 

C; Superf amily : fruit fly white protein; ATP-binding cassette homology 
C;Keywords: ATP; glycoprotein; nucleotide binding; P-loop; transmembrane protein 
F; 113- 3 17 /Domain: ATP-binding cassette homology <ABC> 
F; 130- 137 /Region : nucleotide-binding motif A (P-loop) 



F;261-265/Region: nucleotide-binding motif B 

F; 67, 93, 472,554, 651 /Binding site: carbohydrate (Asn) (covalent) #status 
predicted 

Query Match 18.7%; Score 656; DB 1; Length 687; 

Best Local Similarity 30.3%; Pred. No. 6.9e-44; 

Matches 178; Conservative 113; Mismatches 265; Indels 32; Gaps 10; 

Qy 88 IQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGG — KIKSGQIWINGQPSSPQLVR 145 

: : | : I : : I I : : I I I I I : : I I : : I I II HIM : :: 

Db 113 L KN VC GVAY P GEL LAVMG S S GAGKTT L LN ALAFRS P Q GI QVS P S GMRLLN GQ P VDAKEMQ 172 

Qy 146 KCVAHVRQHNQLL PN LT VRET LAFI AQMRL P RT FS QAQRDKRVEDVI AELRLRQCADT RV 205 

I : I : I : : : I I II I I I : I : I I : II I I : I I I I I : I I : 
Db 173 ARCAYVQQDDLFIGSLTAREHLIFQAMVRMPRHLTYRQRVARVDQVIQELSLSKCQHTII 232 

Qy 206 G-NMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRL 264 

I I : I I I I M I : I : : : I : I : I I I I I I I I I M I I I I : : I : I : I : : ' 

Db 233 GVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFTAHSWQVLKKLSQKGKT 292 

Qy 265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYP.CPRYSNPADFYVDL 324 

I : : : : I I I I : : I I I I : I I I I : I I I : I : : I M I I I I I I I : 

Db 293 VI LTIHQPSSELFELFDKILLMAEGRVAFLGTPSEAVDFFSYVGAQCPTNYNPADFYVQV 352 

Qy 325 TSIDRRSREQELATREKAQSLAALF-LEKV-RDLDDFLWKAETKDLDEDTCVESSVTPLD 382 

: : : I : : I : : : I : I I I I : : I I I : I : : I I : 

Db 353 LAV VP GRE I E S RD RI AK I CDN FAI S KVARDMEQ L L ATKNLEK PLE 397 

Qy 383 TNCLPSP TKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGH 438 

| | I I : : I : : : : : : : : : : I I : : I 

Db 398 QPENGYTYKATWFMQFRAVLWRSWLSVLKEPLLVKVRLIQTTMVAILIGLIFLGQ 452 

Qy 439 GSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKIL 498 

I : | : : I : : I : I : I I : I II III: 

Db 453 QLTQVGVMNINGAIFLFLTNMTFQNVFATINVFTSELPVFMREARSRLYRCDTYFLGKTI 512 

Qy 499 GELPEHCAYIIIYGMPTYWL7^LRPGLQPFLLHFLLWLVVFCCRIMALAAAALLPTFHM 558 

Ml : : : I : I I I : I I I I I : =1 

Db 513 AELPLFLTVPLVFTAIAYPMIGLRAGVLHFFNCLALVTLVANVSTSFGYLISCASSSTSM 572 

Qy 559 AS FFSNALYNSFYLAGGFMINLS SLWTVPAWI SKVS FLRWCFEGLMKIQFS RRTYKM 615 

| : ||||1:||: | : I : I : I : I I I : I : : 

Db 573 ALSVGPPVIIPFLLFGGFFLNSGSVPVYLKWLSYLSWFRYANEGLLINQWADVEPGEISC 632 

Qy 616 PLGNLTIAVSGDKILSAMELDSYPLYAI YLIVIGLSGGFMVLYYVSLR 663 

II II II : : I I: : I III l-M 

Db 633 TSSNTTCPSSGKVILETLNFSAADLPLDYVGLAILIVSFRVLAYLALR 680 



RESULT 6 
JC7860 

brain multidrug resistance protein, BMDP - pig 
C; Species: Sus scrofa domestica (domestic pig) 

C;Date: 18-Nov-2002 #sequence_revision 18-Nov-2002 #text_change 31-Mar-2003 
C; Accession: JC7860 
R;Eisenblaetter, T.; Galla, H.J. 

Biochem. Biophys . Res. Commun. 293, 1273-1278, 2002 



A; Title: A new multidrug resistance protein at the blood-brain barrier. 

A;Reference number: JC7860; MUID: 22050127 ; PMID: 12054514 

A; Accession: JC7 8 60 

A;Molecule type: mRNA 

A; Residues: 1-656 <EIS> 

A;Cross-references : GB:AJ420927 

A; Experimental source: brain 

C; Comment: This protein, a new transport protein of the ATP-binding cassette 
(ABC) superfamily of transporters, expressed in porcine brain capillary 
endothelial cells, plays an importnat role in the exclusion of xenobiotics from 
the brain and participates in drug transport across the blood-brain barrier and 
therefore is considered as a efflux pump at the cerebral endothelium. 
C; Genetics : 
A; Gene : bmdp 

Query Match 18.6%; Score 653; DB 2; Length 656; 

Best Local Similarity 27.6%; Pred. No. l.le-43; 

Matches 192; Conservative 133; Mismatches 270; Indels 100; Gaps 20; 



Qy 


18 


QDT SGLQDRLFS S ES DN S LYFT YS GQPNTLEVRDLN YQVDLASQVPWFEQLAQFKMPWT S 
:: 1 : 1 1 1 1 1 : 1 1 | : | : I : I : 


77 


Db 


15 


RNTNGL PGSSSNELKTSAGGA — VLSFHDICYRVKVKSGFLF 


54 


Qy 


78 


PSCQNSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR— GHGGKIKSGQIWI 

I : : 1 I I : : : : 1 : 1 1 : 1 : 1 1 : : 1 1 1 1 1 : 1 II 1 1 : 1 
— CRKTVEKEILTNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPHG LSGDVLI 


134 


Db 


55 


108 


Qy 


135 


NGQPSSPQLVRKC-VAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIA 

Ml II : 1 1 : : : 1 1 1 1 1 1 1 1 : 1 1 1 1 : : : : : 1 : II 
NGAPRPANF — KCN S G YWQ D D WMGT LT VREN LQ F S AAL RL P T TMTNH E KN E RI NMVT Q 


193 


Db 


109 


166 


Qy 


194 


ELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVK 

II | : M : : I I : : I 1 : 1 1 1 1 1 : 1 II : : 1 : : 1 II 1 1 1 1 1 : II 1 1 1 1 : 
ELGLDKVADSKVGTQFIRGVSGGERKRTSIAMELITDPSILFLDEPTTGLDSSTANAVLL 


253 


Db 


167 


226 


Qy 


254 


TLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPR 

1 I : : I 1 : : 1 : 1 1 1 1 1 1 : 1 1 1 : 1 : 1 1 : : 1 1 : : 1 1 : 1 1 1 1 
LLKRMSKQGRTIIFSIHQPRYSIFKLFDSLTLLASGRLMFHGPAREALGYFASIGYNCEP 


313 


Db 


227 


286 


Qy 


314 


YSNPADFYVDLTS IDRRSREQELATREK AQSIAA LFLEK 

1 : 1 1 1 1 1 : : 1 : : : 1 1 : : 1 : 1 1 1 1 : 

YNNPADFFLDVINGDSSAWLSR7VDRDEGAQEPEEPPEKDTPLIDKLAAFYTNSSFFKDT 


352 


Db 


287 


346 


Qy 


353 


VRDLDDFLWK7VETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFR 

: 1 1 1 : 1 : 1 1 1 : 1 | : | | | 
KVELDQFSGGRKKK KSSVYKEVTYTTSFC HQLRWI SRRSFKNLLG 


412 


Db 


347 


391 


Qy 


413 


DLPTLLIHGAEACLMSMTI GFLYFGHGS IQLS FMDTAALLFMI GALI PFNVILDVI SKCY 
: : : : : | | : : : : : 1 : 1 1 : : : I : 
NPQASVAQI IVTII LGLVIGAI FYDLKNDPSGIQNRAGVLFFL TTNQCF 


472 


Db 


392 


440 


Qy 


473 


S ERAMLYYELEDGLYTTGPYFFAKILGE-LPEHCAYIIIYGMPTYWLANL 

I I : : : 1 II 1 1 1 1 : 1 : 1 1 1 1 : M : 1 1 
SSVSAVELLWEKKLFIHEYISGYYRVSSYFFGKLLSDLLPMRMLPSIIFTCITYFLLGL 


521 


Db 


441 


500 


Qy 


522 


RPGLQPFLLHFLLVWLVVFCCRIMAL7KAAALLPTFHMASFFSNALYNSFYLAGGFMINLS 


581 



: I : I : 



Db 501 KPAVGSFFIMMFTLMMVAYSASSMALAIAAGQSWSVATLLMTISFV^ 560 



Qy 582 SLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT IAVSGDKIL — SA 632 

:: : I : I I : I I : I : : I I : I : I : I 

Db 561 TWPWLSWLQYFSIPRYGFSALQYNEFLGQNF-CPGLNVTTNNTCSFAICTGAEYLENQG 619 

Qy 633 MELDSYPLYAIYLIVIGLSGGFMVLYYVSLRFIKQ 667 

: | : : I : :: : : I : : I : I : I : 
Db 620 I SLSAWGLWQNHVALACMMVI FLT I AYLKLLLLKK 654 



RESULT 7 
S77690 

probable membrane protein YOL075c - yeast ( Saccharomyces cerevisiae) 

N; Alternate names: hypothetical protein 01125; hypothetical protein O1130; 

hypothetical protein YOL074c 

C; Species: Saccharomyces cerevisiae 

C;Date: 21-Apr-1997 #sequence_revision 09-May-1997 #text_change 19-Apr-2002 

C;Accession: S77690; S66767; S66768 

R;Alexandraki, D.; Katsoulou, C. ; Tzermia, M. 

submitted to the Protein Sequence Database, July 1996 

A/Reference number: S66756 

A;Accession: S77690 

A;Molecule type: DNA 

A; Residues: 1-1294 <ALE> 

A/Cross-references: EMBL: Z74816; MIPS:YOL075c 

A; Note: this is a revision to the sequence from reference S66756 
A;Accession: S66767 
A;Molecule type: DNA 

A;Residues: 1-179, 1 TTRTGVFLVVKRED 1 <ALW> 

A;Cross-references: EMBL: Z74 816 

A; Experimental source: strain S288C 

A;Note: this sequence has been revised in reference S77690 

A;Note: this was assumed to be protein YOL074c 

A;Accession: S66768 

A;Molecule type: DNA 

A; Residues: 200-1294 <ALF> 

A;Cross-references: EMBL: Z74817 

A; Experimental source: strain S2 8 8C 

A; Note: this sequence has been revised in reference S77690 

A;Note: this was assumed to be the complete sequence of protein YOL075c 

C; Genetics : 

A;Cross-references: SGD:S0005435 
A; Map position: 15L 
A;Note: YOL075c 

C;Superf amily: unassigned ATP-binding cassette proteins; ATP-binding cassette 
homology 

C; Keywords: ATP; nucleotide binding; P-loop; transmembrane protein 

F;45-263/Domain: ATP-binding cassette homology <ABCl> 

F; 62- 69 /Region : nucleotide-binding motif A (P-loop) 

F;376-392/Domain: transmembrane #status predicted <TM1> 

F; 469-485/Domain: transmembrane ffstatus predicted <TM2> 

F;496-512/Domain: transmembrane #status predicted <TM3> 

F; 60 6-622 /Domain : transmembrane #status predicted <TM4> 

F;710-916/Domain: ATP-binding cassette homology <ABC2> 

F;727-734/Region: nucleotide-binding motif A (P-loop) 

F; 1042- 105 8 /Domain: transmembrane ((status predicted <TM5> 



F; 112 5-1 141 /Domain : transmembrane #status predicted <TM6> 
F; 1177-1193/Domain: transmembrane #status predicted <TM7> 
F; 1269-1285/Domain: transmembrane #status predicted <TM8> 

Query Match 18.6%; Score 653; DB 2; Length 1294; 

Best Local Similarity 30.1%; Pred. No. 2.9e-43; 

Matches 171; Conservative 111; Mismatches 239; Indels 48; Gaps 13; 

Qy 88 IQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQI 132 

: I : | | : : | : : I I I I : : I I : I : : II : I I 
Db 45 WTFSMDLPSGSVKAVMGGSGSGKTTLLNVLASKISGGLTHNGSIRYVLEDTGSEPNETE 104 

Qy 133 WINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKR- 187 

: : I I I : I : : I : I I I I I I I I I I : : I : : : I I : 

Db 105 PKRAHLDGQ-DHPIQKHVIMAYLPQQDVLSPRLTCRETLKFAADLKL NSSERTKKL 159 

Qy 188 -VEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSF 246 

II : I II I : I I I I II: I I I I I II : II : I I I I : : II I : I I I I I : I I I : : 

Db 160 MVEQLI EELGLKDCADTLVGDNSHRGLSGGEKRRLSI GTQMI SNPS IMFLDEPTTGLDAY 219 

Qy 247 TAHNLVKTLSRLAK-GNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFT 305 

: I : : | I I : I I I I : : I : I I I I II I I I I : : : I : I : : I I 

Db 220 SAFLVI KTLKKLAKEDGRTFIMSIHQPRSDILFLLDQVCILSKGNWYCDKMDNTIPYFE 279 

Qy 306 AIGYPCPRYSNPADFYVT)LTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFLWKAET 365 

: I I I I : I I I I : : : I I : I : I I I : : I I I : : II : : I = = 

Db 280 SIGYHVPQLVNPADYFIDLSSVDSRSDKEEAATQSRLNSL IDHWHD YERTH 330 

Qy 366 KDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEAC 425 

I :: : | : | : : I :: I : I I I I I : I I I I : II 

Db 331 LQLQAESYI-SNATEIQIQNM — TTRLP-FWKQVTVLTRRNFKLNFSDYVTLISTFAEPL 386 

Qy 426 LMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIP — FNVILDVISKCYSERAMLYYELE 483 

:: |::|: : : I :: :: I | : |: I 

Db 387 IIGTVCGWIYYKPDKSSIGGLRTTTACLYASTILQCYLYLLFDTYRLCEQDIALYDRERA 446 

Qy 484 DGLYTTGPYFFA-KI LGELPEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLVWLWFCC 542 

: I I : I I I I : I : I : I I : : I : I I : I : I I 
Db 447 EGS VT PLAFI VARKI S LFLS DDFAMTMI FVS I T YFMFGLEADARKFFYQFAWFLCQLS C 506 

Qy 543 RIMALAAAALLPT FHMAS FFSNAL YNS FYLAGGFMINLS S LWTVPAWI S KVS FLRWCFEG 602 

: : : : I : III I : : I I : I : | | : : | : | 

Db 507 S GLSMLS VAVS RDFS KAS LVGNMT FTVLSMGCGFFWAKVMPVYVRWI KYI AFTWYS FGT 566 

Qy 603 LMKIQFSRR TYKMPLGNLTIAVSG 626 

II I : I I I : I I 

Db 567 LMS STFTNS YCTTDNLDECLGNQI LEVYG 595 



RESULT 8 
T08934 

hypothetical protein F27G19.20 - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: ll-Jun-1999 #sequence_revision ll-Jun-1999 #text_change 17-Mar-2000 
C;Accession: T08934 

R;Bevan, M. ; Hilbert, H.; Braun, M. ; Holzer, E.; Brandt, A.; Duesterhoef t , A. ; 
Bancroft, I.; Mewes, H.W.; Mayer, K.F.X.; Lemcke, K. ; Schueller, C. 



submitted to the Protein Sequence Database, May 1999 
A; Reference number: Z16519 
A;Accession: T08934 
A; Molecule type: DNA 
A; Residues: 1-635 <BEV> 

A; Cross-references: EMBL : AL07 8467 ; GSPDB : GN00062 ; ATSP : F27G19 . 20 
A; Experimental source: cultivar Columbia; BAC clone F27G19 
C; Genetics : 

A;Gene: ATSP : F27G19 . 20 
A;Map position: 4 

A;Introns: 38/3; 253/1; 304/1; 414/3 

C; Superf amily : fruit fly white protein; ATP-binding cassette homology 

Query Match 18.6%; Score 651.5; DB 2; Length 635; 

Best Local Similarity 31.1%; Pred. No. 1.4e-43; 

Matches 191; Conservative 104; Mismatches 240; Indels 79; Gaps 19; 

Qy 30 SESDNSLYFTY S GQ PNT LEVRDLN YQVDLAS QVPW FEQLAQ FKMPWT S P S CQN S CEL 86 

: I I I I : : I I I : : I I I I I : : 

Db 17 TNDDRSLPFS I FKKANNPVTLKFENLVYTVKLKDSQGCF GKNDKTEERT 65 

Qy 87 GIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIK-SGQIWINGQPSSPQLVR 145 

: : I : I : I : : I I : : I I I I : I I I : I I I I I : I I I : I I : I : 
Db 66 ILKGLTGIVKPGEILAMLGPSGSGKTSLLTALGGRVGEGKGKLTGNISYNNKPLS-KAVK 124 

Qy 14 6 KCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRV 205 

: I I : I I I I I I I I I I I : I I I : I : : : I : : I : I I I : I I I : 

Db 125 RTTGFVTQDDALYPNLTVTETLVFTALLRLPNSFKKQEKIKQAKAVMTELGLDRCKDTII 184 

Qy 206 GNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLV 265 

I : : I I : I I I I I : I I I I I : : I I I : I I I I I I I I I I I I I : I I I I : I I I 
Db 185 GGPFLRGVSGGERKRVSIGQEILINPSLLFLDEPTSGLDSTTAQRIVSILWELARGGRTV 244 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GY-PCPRYSNPADFYVDL 324 

: : : I I I : I I : I I : : I I : : I I I I I : I I : I : 

Db 245 VTTIHQP SKGNPVYFGLGSNAMDYFASVGYSPLVERINPSDFLLDI 290 

Qy 325 TSIDR RSREQE1ATREKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTC V 374 

: I : : | : | I : I I I : : : : I II 
Db 291 ANGKPLLVISCWPSVGSDESQRPEAMKAALVAFYKTNLLDSVINEVKGQD DLCNKPR 347 

Qy 375 ESSVTPLDTNCLPS-PTKMPGAVQQFTTLIRRQIS NDFRDLPTLLIHGAEACLMSM 429 

III : I I II | | | | : : | : : I : I : : : I 

Db 348 ESS — RVATNTYGDWPTTW WQQFCVLLKRGLKQRRHDSFSGMKV AQIFIVSF 397 

Qy 430 TI GFLYFGHGS I QLS - FMDTAALLFMI GALI P FNVI LDVI S KCYSERAMLYYELEDGLYT 488 

II:: : : I I MM: I : I I I I I I I I : I 

Db 398 LCGLLWW QTKI S RLQDQIGLLFFI SS FWAFFPLFQQI FTFPQERAMLQKERS SGMYR 454 

Qy 489 TGPYFFAKILGELPEHCAYIIIYGMPTYWIJ^ILRPGLQPFLLHFLLVWLWFCCRIMALA 548 

I I I : : : : I : I I : : I I I : I I I I : I : : : I : I I 

Db 455 LSPYFLSRWGDLPMELILPTCFLVITYWMAGLNHNLANFFVTLLVLLVHVLVSGGLGLA 514 

Qy 549 AAALLPTFHMAS FFSNALYNS FYLAGGFMINLS SLWTVPAWI SKVS FLRWCFEGLMKIQF 608 

II: I : : : : I I I I I : : I I : II : : : 
Db 515 LGALVMDQKSATTLGSVIMLTFLLAGGYYVQ HVPVFISWIKY VSI 559 



Qy 609 SRRTYK-MPLGNLT 621 

I I I : I I I 
Db 560 GYYTYKLLI LGQYT 573 



RESULT 9 
G02068 

white homo log - human 

C; Species: Homo sapiens (man) 

C;Date: 21-Dec-1996 #sequence_revision 06-Jun-1997 #text_change 02-Feb-2001 
C;Accession: G02068 

R;Croop, J.M. ; Tiller, G. ; Fletcher, J. A.; Lux, M. ; Raab, E.; Goldenson, D. ; 
Arciniegas, S.; Son, D . ; Wu, R. 

submitted to the EMBL Data Library, August 1995 
A; Reference number: H00769 
A; Accession: G02068 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A; Molecule type: mRNA 
A; Residues: 1-638 <CRO> 

A; Cross-references: EMBL:U34919; NID : gl314276; PIDN : AAC51098 . 1 ; PID:gl314277 
C; Genetics : 
A; Gene: white 

C; Super family: fruit fly white protein; ATP-binding cassette homology 

C; Keywords: ATP; nucleotide binding; P-loop 

F; 61-253/Domain: ATP-binding cassette homology <ABC> 

F; 7 8- 8 5/ Region : nucleotide-binding motif A (P-loop) 

Query Match 17.6%; Score 618; DB 2; Length 638; 

Best Local Similarity 25.7%; Pred. No. 6.5e-41; 

Matches 173; Conservative 130; Mismatches 266; Indels 104; Gaps 18; 



Qy 


33 


DNSLYFT — YSGQPN T LEVRDLN YQVDLAS QVPWFEQLAQ FKMPWT S P S CQN S CEL 

| I : I : I I : 1 1 1 1 : 1 1 : II : : : 
DNNLTEAQRFS SLPRRAAVNI EFRDLS YS V PEGPWWRKKGYKTL 


86 


Db 


17 


60 


Qy 


87 


GIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRK 

: : : | | | | : : : I 1 : 1 1 1 1 : : : 1 : : : : 1 1 1 : 1 1 1 1 : II 
-LKGISGKFNSGELVAIMGPSGAGKSTLMNILAGYRETG— MKGAVLINGLPRDLRCFRK 


146 


Db 


61 


117 


Qy 


147 


CVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQC7VDTRVG 

:: I : I I 1 : 1 1 1 : 1 : 1 ::| 1 : |:::: II Ihll 1 
VSCYIMQDDMLLPHLTVQEAMMVSAHLKLQE— KDEGRREMVKEILTALGLLSCANTRTG 


206 


Db 


118 


175 


Qy 


207 


NM YVRGL S GGE RRRVS I GVQ L LWN PGILILDEPTS GLD S FT AHN LVKT L S RLAKGN RL VL 
: | I I I : I : I : : I : : 1 : 1 1 : : 1 1 1 1 1 1 1 1 1 : : 1 : 1 1 : 1 1 : : 
S L S GGQRKRLAI ALELVNN P P VMFFDE PT S GLD S AS C FQWS LMKGLAQGGRS 1 1 


266 


Db 


176 


230 


Qy 


267 


ISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPADFYVDLTS 

: : | | | : : I I I I : : : : 1 : I 1 : : 1 1 : 1 1 1 1 1 1 1 1 1 : : : 1 
CTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVAS 


326 


Db 


231 


290 


Qy 


327 


IDRRSREQEL — ATREKAQSLAALFLEKVRDL DDFLWKAET KDLD 

: : I 1 1 1 : 1 1 1 : 1 1 1 : II 
GEYGDQNSRLVRAVRE GMCDSDHKRDLGGDAEVNPFLWHRPSEEVKQTKRLKGLR 


369 


Db 


291 


345 


Qy 


370 


EDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSM 


429 



Db 



346 KDSSSMEGCHSFSASCL 



TQFCILFKRTFLSIMRDSVLTHLRITSHIGIGL 395 



Qy 430 TIGFLYFGHGSIQLSFMDTAALLF MI GALI PFNVT LDVI S KCYSERAMLYYELE 483 

I I I I I I : : : I I I I I : I : I : I I 

Db 396 LIGLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMP TVLTFPLE 440 

Qy 484 DGL YTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLL 534 

I : I : I : I I : : : I : : I I I : : hi I 

Db 441 MGVFLREHLNYWYSLKAYYIAKTMADVPFQIMFPVAYCSIVYWMTSQPSDAVAFVLFAAL 500 

Qy 535 WL VVFCCRIMALAAAALLPT FHMAS FFSNALYNS FYLAGGFMINLS S LWTVPAWI S KVS 594 

: : : I I : : I : I | | | : : : : | I : I : I 

Db 501 GTMT S L VAQ S LGL L I GAAS T S LQVAT FVG P VT AI P VL L F S G F FVS FDT I P T YLQWMS Y I S 560 

Qy 595 FLRWCFEGLMKIQF— SRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSG 652 

::|: III:: : I : : M ::::: I I : MM 
Db 561 YVRYGFEGVILSIYGLDREDLHCDIDETCHFQKSEAILRELDVENAKLY-LDFIVLG 616 

Qy 653 GFMVL YYVS LRFI 665 

: : : M I I I 
Db 617 IFFISLRLI 625 



RESULT 10 
D96553 

hypothetical protein F5D21-6 [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 23-Mar-2001 
C;Accession: D96553 

R;Theologis f A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A. ; Kaul, S.; White, O. 
Alonso, J.; Altaf, H . ; Araujo, R. ; Bowman, CL.; Brooks, S.Y.; Buehler, E.; 
Chan, A.; Chao, Q. ; Chen, H.; Cheuk, R.F.; Chin, C.W.; Chung, M.K.; Conn, L. ; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K.; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J.; Fong, B. ; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B . ; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 

A/Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C; Khan, S.; Khaykin, E. 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B. ; Langin 
Hooper, S.; Lee, A.; Lee, J.M. ; Lenz, C.A. ; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C.; Osborne, B.I.; Pai, G. ; Peterson, J.; Pham, P.K. 
Rizzo, M. ; Rooney, T.; Rowley, D.; Sakano, H. 

A;Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H.; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD.; Utterback, T.; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D.; Yu, G. ; Fraser, CM.; 
Venter, J.C; Davis, R.W. 

A; Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 

A; Reference number: A86141; MUID : 21016719 ; PMID : 11130712 

A;Accession: D96553 

A; Status : preliminary 

A; Molecule type: DNA 

A; Residues: 1-687 <STO> 

A; Cross-references: GB:AE005173; NID: gl0092349; PIDN : AAG12758 . 1 ; GSPDB : GN00141 

C; Genetics: 

A; Gene: F5D21.6 

A;Map position: 1 



C;Superfamily: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 



Query Match 17.0%; Score 595; DB -2 ; Length 687; 

Best Local Similarity 26.9%; Pred. No, 4.8e-39; 

Matches 174; Conservative 127; Mismatches 247; Indels 98; Gaps 19; 

Qy 88 IQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKC 147 

: |: |:::||:| II |:::||| : II | :| : :||: : :| 

D b 45 LDGLNGHAEPGRIMAIMGPSGSGKSTLLDSLAGRLARNVIMTGNLLLNGKKA— RLDYGL 102 

Qy 148 VAHWQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGN 207 

M : | I : I : II I I I I : : I : I I : : : : I I I I I I : I I I : I I 

Db 103 VAYVTQEDI LMGTLTVRET IT YSAHLRLS S DLTKEEVNDIVEGT 1 1 ELGLQDCADRVI GN 162 

Qy 208 MYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAK-GNRLVL 266 

: | | : | | | | | : I I I : : : : I I I I I I I I I I I I I I : I : : : I : I : I I I : 
D b 163 WHSRGVSGGERKRVSVALEILTRPQILFLDEPTSGLDSASAFFVIQALRNIARDGGRTW 222 

Qy 267 ISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLTS 326 

I : I I I I : : I I I I : I : : I I = I I : : I : : I I : M I : I I : I : : : 
Db 223 SSIHQPSSEVFALFDDLFLLSSGETVYFGESKFAVEFFAEAGFPCPKKRNPSDHFLRCIN 282 

Q y 327 ID RRSREQELATREKAQSLAALFLEKVRDLDDF LWKAETKDLDED 371 

| : | I I II: : I I : I I : : : : I : : I 

Db 2 83 S DFDTVTATLKGSQRI RET P- AT S DPLMNLAT S EI - KARLVENYRRS VYAKSAKS RI REL 34 0 

Qy 372 TCVE SSVTPLDTNCLPS PTKMPGAVQQFTTLI RRQI SNDFRDL PTL 417 

: | || : I I I : I I I I : : 

Db 341 AS I EGHHGMEVRKGSEAT WFKQLRTLTKRSFVNMCRDIGYYWSRI 385 

Qy 418 LIHGAEACLMSMTI GFLYF — GHGS IQLSFMDTAALLFMI GALI PFNVI LDV I SKCYS 473 

: | : : : | : | : : : I I I : I : I : I : : 

Db 386 VIY 1 WS FCVGTI FYDVGH SYTSILARVSCGGFITGFMTFMSIGGFPSFIE 436 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFL 533 

| : | | || I : : I I :| I II : III: : I 

Db 437 EMKVFYKERLSGYYGVSVYIISNYVSSFPFLVAIALITGSITYNMVKFRPGVSHWAFFCL 496 

Qy 534 LVWLWFCCRIMALAAAALLPTFHMAS FFSNALYNS FYLAGGFMINLS SL WTVPAW 589 

: : | : : | : | : | | I : : I I II II 

Db 497 NIFFSVSVIESLMMWASLVPNFLMGLITGAGIIGIIMMTSGFFRLLPDLPKVFWRYP 554 

Qy 590 ISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAM ELDSYPLYAI 643 

| | : | : I : I I I : I : : : | : : : : : : : I I I 

Db 555 ISFMSYGSWAIQGAYKNDFLGLEFD-PMFAGEPKMTGEQVINKIFGVQVTHSKWWDLSAI 613 

Qy 644 YLIVIGLSGGFMVLYYVSLRF IKQKPS 670 

||:: : : | : : : I : : I : : I I 

Db 614 VLILV CYRI LFFIVLKLKERAEPALKAIQAKRTMKSLKKRPS 655 



RESULT 11 
T47652 

ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T26I12.10 

C; Species: Arabidopsis thaliana (mouse-ear cress) 



C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C;Accession: T47652 

R;Monfort, A.; Casacuberta, E. ; Puigdomenech, P.; Mewes, H.W.; Lemcke, K. ; 
Mayer, K.F.X.; Quetier, F. ; Salanoubat, M. 

submitted to the Protein Sequence Database, February 2000 

A; Reference number: Z24471 

A; Accession: T47652 

A; Status: preliminary 

A;Molecule type: DNA 

A; Residues: 1-725 <MON> 

A;Cross-references : EMBL : AL132954 

A; Experimental source: cultivar Columbia; BAC clone T26I12 
C; Genetics : 
A;Map position: 3 
A;Note: T26I12.10 

C;Superfamily: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 16.9%; Score 591; DB 2; Length 725; 

Best Local Similarity 25.5%; Pred. No. l.le-38; 

Matches 175; Conservative 135; Mismatches 273; Indels 104; Gaps 18; 

Qy 4 4 PNTLEVRDLNYQ VTDLASQVPWFEQLAQFKMPWTS P SCQNS CELGI QNLS FKVRS GQMLAI 103 

| | :| I I I : : | || : : ::| : | :||: 

Db 7 0 PYVLNFNNLQYDVTLRRRFGF SRQNGVKTLLDDVSGEASDGDILAV 115 

Qy 104 IGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQP-SSPQLVRKCVAHVRQHNQLLPNLT 162 

:|:|| |:::|:| : II I :: I : :ll: :|:: hi I : I I II 

Db 116 LGASGAGKSTLIDAIiAGRVAEGSLR-GSVTLNGEKVLQSRLLKVI SAYVMQDDLLFPMLT 174 

Qy 163 VRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVS 222 

| : | | | | : : I I I I : I : : :: : I I I : I : I II I : I : I : I I : I I I I I I I I I 

Db 175 VKETLMFASEFRLPRSLSKSKKMERVEALIDQLGLRNAANTVIGDEGHRGVSGGERRRVS 234 

Qy 223 I GVQLLWNPGI LI LDEPTSGLDS FTAHNLVTCTLSRLAKGNRLVLI SLHQPRSDI FRLFDL 282 

I I : : : : I : I I I I I I I I I I I I : I : I I : I : : I : : I : I I I : I II 
Db 235 IGIDIIHDPIVLFLDEPTSGLDSTNAFMWQVLKRIAQSGSIVIMSIHQPSARIVELLDR 294 

Qy 283 VLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKA 342 

:::::] : : I : : : I : III I : : I : I I III : I 

Db 295 LI ILSRGKSVFNGSPASLPGFFSDFGRPI PEKENI SEFALDLV RELE-GSNEGT 347 

Qy 343 QSLAALFLEKVRDLDDFLWK AETKDLDED TCVESSVTP LDT 383 

: : I III I : : I I I : I : : I I : : 

Db 34 8 KALVD-FNEK WQQNKISLIQSAPQTNKLDQDRSLSLKEAINASVSRGKLVSG 398 

Qy 384 NCLPSPTKM PGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLY 435 

: : | | | I : I I : I : I I : I I : I : : I I 

Db 399 SSRSNPTSMETVSSYANPSLFETF-ILAKRYMKNWIR-MPELV — GTRIATVMVTGCLLA 454 

Qy 436 FGHGS IQLS FMDTAALLFMI GALI P — FNVI LDVT SKCYSERAMLYYELEDGLYTTGP YF 493 

: : : I : : : I I II : II : I I I I 

Db 455 TVYWKLDHTPRGAQERLTLFAFWPTMFYCCLDNVPVFIQERYIFLRETTHNAYRTSSYV 514 

Qy 494 FAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLWLVVFCCRIMALAAAALL 553 

: | ||: I : : : I : I I I I : I : : I I : : : : : : : 
Db 515 ISHSLVSLPQLLAPSLVFSAITFWTVGLSGGLEGFVFYCLLIYASFWSGSSWTFISGW 574 



Qy 554 PTFHMAS FFSNALYNS FYLAGGFMINLS S L WTVPAWISKVSFLRWCFEGLMKIQFS- 609 

| : I III:! : II I :| |:: :| :: :| 
Db 575 PNIMLCYMVSITYLAYCLLLSGFYVNRDRIPFYWT WFH YI S I LKYPYEAVLINEFDD 631 

Q y 610 RRTYKMPLGNLTIAVSGDKILSAMELDS 637 

:: : : I :| :|: : 
Db 632 PSRCFVRGVQVFDSTLLGGVSDSGKVKLLETLSKSLRTKITESTCLRTGSDLLAQQGITQ 691 

Qy 638 YPLYAIYLIVIGLSGGFMVLYYVSLRF 664 

: I | : | : | : | I 

Db 692 LSKWDCLWITFASGLFFRILFYFALLF 718 



RESULT 12 
B88474 

protein C05D10.3 [imported] - Caenorhabditis elegans 
C; Species: Caenorhabditis elegans 

C;Date: 10-May-2001 #sequence_revision 10-May-2001 #text_change 15-Jun-2001 
C; Accession: B88474 

R; anonymous, The C. elegans Sequencing Consortium. 
Science 282, 2012-2018, 1998 

A; Title: Genome sequence of the nematode C. elegans: a platform for 
investigating biology. 

A; Reference number: A75000; MUID : 99069613 ; PMID: 9851916 
A; Note: see websites genome.wustl.edu/gsc/C_elegans/ and 
www_sanger.ac.uk/Projects/C_elegans/ for a list of authors 

A;Note: published errata appeared in Science 283, 35, 1999; Science 283, 2103, 

1999; and Science 285, 1493, 1999 

A; Accession: B88474 

A; Status: preliminary 

A; Molecule type: DNA 

A; Residues: 1-559 <STO> 

A;Cross-references: GB:chr_III; PIDN : AAA20989 . 1 ; PID:g532111; GSPDB: GN00021; 
CESP:C05D10.3 

A;Note: similar to D. melanogaster white protein 

C; Genetics : 

A; Gene: C05D10.3 

A; Map position: 3 

C;Superfamily: fruit fly white protein; ATP-binding cassette homology 

Query Match 16.8%; Score 590; DB 2; Length 559; 

Best Local Similarity 29.3%; Pred. No. 9e-39; 

Matches 159; Conservative 98; Mismatches 231; Indels 54; Gaps 11 

Qy 88 IQNLS FKVRS GQMLAI IGS SGCGRAS LLDVITGRGHGGKI KSGQIWINGQP S S PQLVRKC 147 

: I : I I I : : I I I : I I I I I : : I : : I : I I I I I : I : : : : I : 

Db 10 LHNVSGMAES GKLLAI LGS S GAGKTTLMNVLT S RNLTNLDVQGS I LI DGRRANKWKI REM 69 

Qy 148 VAHVRQHNQLLPNLTVRETLAFIAQMRL-PRTFSQAQRDKRVEDVIAELRLRQCADTRVG 206 

I I : I I : : :| II I |:|::|: : :| :| INI::: |::|||| :| 
Db 70 SAFVQQHDMFVGTMTAREHLQFM7VRLRMGDQYYSDHERQLRVEQVLTQMGLKKCADTVIG 129 

Qy 207 -NMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRIAKGNRLV 265 

: : I I I M : : I : I : : I I I I I I I I II I I : I I : : I : I I I I 
Db 130 IPNQLKGLSCGEKKRLSFASEILTCPKILFCDEPTSGLDAFMAGHWQALRSLADNGMTV 189 



Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLT 325 

: | :: | I I I :: I I : I I I I I I I I I I I I I I I I I I I I : 

D b 190 IITIHQPSSHVYSLFNNVCLMACGRVIYLGPGDQAVPLFEKCGYPCPAYYNPADHLIRTL 249 

Qy 326 S I DRRSREQELATREKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTC VES 376 

: : I : I : I : I II : I I : I 

Db 250 AVIDSDRATSMKT IS KIR — QGFL STDLGQSVLAIGNANKLRAAS 2 92 

Qy 377 SVTPLDTNCLPSPTKM PGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSM 429 

MM: M Mil Ml:::: 

Db 293 FVTGSDTS EKTKTFFNQDYNASFWTQFLALFWRSWLTVI RDPNLLSVRLLQILITAF 34 9 

Qy 430 TIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDV 1 S KC YS ERAMLYYE 481 

I : : I I I : I :: II I :: : :| :: I 

D b 350 ITGIVFF QTPVTPATIISINGIM- FNHIRNMNFMLQFPNVPVITAELPIVLRE 401 

Qy 482 LEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWFC 541 

: I : I I I I II : I I I : : I : I I I : : I I : I : I : 

Db 402 NANGWRTSAYFLAKNIAELPQYIILPILYNTIVYWMSGLYPNFWNYCFASLVTILITNV 461 

Qy 542 CRIMALAAAALLPTFHMASFFSNALYNSFYIJVGGFMINL^ 601 

: : | I : : I I I I I : : : | : | : I : : : : I 

Db 462 AI S I S YAVAT I FANTDVAMT I LP I FWP IMAFGGFFI T FDAI P S YFKWLS S LS YFKYGYE 521 

Qy 602 GL 603 

I 

Db 522 AL 523 



RESULT 13 
T47648 

ABC transporter-like protein - Arabidopsis thaliana 

N; Alternate names: protein T15C9.80 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C;Accession: T47648 

R;Mewes, H.W.; Rudd, S.; Lemcke, K . ; Mayer, K.F.X. 
submitted to the Protein Sequence Database, April 2000 
A; Reference number: Z24470 
A; Accession: T47648 
A; Status: preliminary 
A;Molecule type: DNA 
A; Residues: 1-720 <MEW> 
A;Cross-references: EMBL: AL132970 

A; Experimental source: cultivar Columbia; BAG clone T15C9 

C; Genetics : 

A; Map position: 3 

A;Note: T15C9.80 

C; Super family: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 16.8%; Score 589.5; DB 2; Length 720; 

Best Local Similarity 25.0%; Pred. No. 1.4e-38; 

Matches 173; Conservative 122; Mismatches 289; Indels 109; Gaps 14 
Qy 44 PNTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAI 103 



Db 



54 P FVLS FNNLT YNVS VRRKLDFHD LVPWRRTSFSKTKTL-LDNISGETRDGEILAV 107 



Qy 104 IGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTV 163 

:|:|| |:::|:| : I I :| I : :||: :::: |:| I : I I III 
Db 108 LGASGSGKSTLIDALANRIAKGSLK-GTWLNGEALQSRMLPCVISAYVMQDDLLFPMLTV 166 

Qy 164 RETIAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSI 223 

I | | | | : | | M : : : : : I I : : I : I : I I I : I : I I : I I I I I I I I I I 
Db 167 EETLMFAAEFRLPRS LPKS KKKLRVQALI DQLGI RNAAKT 1 1 GDEGHRGI S GGERRRVS I 226 

Qy 224 GVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLV 283 

I : : : : I : I I I I I I I I I I I : I : I I I I : I : : : : : I : I I I : I I : 
Db 227 GIDIIHDPIVLFLDEPTSGLDSTSAFMVVKVLKRIAESGSIIIMSIHQPSHRVLSLLDRL 286 

Qy 284 LLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLTS IDRRS 331 

: : : | : : I : : : I III I : I : I I : : 

Db 287 IFLSRGHTVFSGSPASLPSFFAGFGNPIPENENQTEFALDLIRELEGSAGGTRGLVEFNK 346 

Qy 332 REQ ELAT RE KAQ S LAA LFLEKVRDLDDFLWKAETKDLDEDTCVE SSVT 37 9 

: I I: : 1:1 I I : M : : I I I I 

Db 347 KWQEMKKQSNPQTLTPPASPNPNLTLK EAI S AS I S RGKLVS GGGGGS S VI 396 

Qy 380 PLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLY 435 

I I : I I I I I I I III I : : I I I : 

Db 397 NHGGGTLAVPAFANPFWIEIKTLTRRSILNSRRQ-PELL — GMRLAT VI VT - G F I LAT VF 452 

Qy 436 FGHGS I QLS FMDTAALLFMI GALI P FNVI LDVI S KC YS ERAMLY YEL 482 

I : :| I I I : I I : I 

Db 453 W RL DN S P KGVQ E RL G F FAFAMS TM FYTCADALPVFLQERYI FMRET 498 

Qy 483 EDGLYTTGPYFFAKILGELPEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLVWLWFCC 542 

| | : : I : : : | : | | M I I : I : : : 

Db 499 AYNAYRRSSYVLSHAIWFPSLIFLSLAFAWTFWAVGLEGGLMGFLFYCLIILASFWSG 558 

Qy 543 RIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEG 602 

: : : I : I : I I I I I I : I : I : : : : I 

Db 559 SSFVTFLSGWPHVMLGYTIWAILAYFLLFSGFFINRDRIPQYWIWFHYLSLVKYPYEA 618 

Qy 603 LMKIQFSRRTY KMPLGNLTIAV SGDKI 62 9 

: : : : I I I INN: : I : 

Db 619 VLQNEFSDPTECFVRGVQLFDNSPLGELTYGMKLRLLDSVSRSIGMRISSSTCLTTGADV 678 

Qy 630 LSAMELDSYPLYAIYLIVIGLSGGFMVLYYVSL 662 

I : : I I : I I : I : I : I 

Db 679 LKQQGVTQLSKWNCLLITVGFGFLFRILFYLCL 711 



RESULT 14 
T47650 

ABC transporter-like protein - Arabiclopsis thaliana 

N; Alternate names: protein T15C9.110 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C; Accession: T47650 

R;Mewes, H.W.; Rudd, S.; Lemcke, K. ; Mayer, K.F.X. 
submitted to the Protein Sequence Database, April 2000 
A; Reference number: Z24470 



A; Accession: T47 650 

A; Status : preliminary 

A; Molecule type: DNA 

A; Residues: 1-708 <MEW> 

A; Cross-references: EMBL : AL13297 0 

A; Experimental source: cultivar Columbia; BAC clone T15C9 
C; Genetics : 
A;Map position: 3 
A;Note: T15C9.110 

C; Super family: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 16.7%; Score 586.5; DB 2; Length 708; 

Best Local Similarity 26.8%; Pred. No. 2.4e-38; 

Matches 187; Conservative 124; Mismatches 254; Indels 133; Gaps 20; 

Qy 44 PNTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAI 103 

|| :|:| I I : I M | : : : : : : I I : : I I : 

D b 60 PFLLSFNNLSYNWLRRRF DFSRRKTA SVKTLLDDITGEARDGEILAV 107 

Qy 104 IGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQP-SSPQLVRKCVAHVRQHNQLLPNLT 162 

: I I I I ::: I : I : I I :| I : :||: :|:: |:| I : I I II 

Db 108 LGGS GAGKSTLI DALAGRVAEDS LK- GTVTLNGEKVLQS RLLKVI SAYVMQDDLLFPMLT 166 

Qy 163 VRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVS 222 

| : | | | | :: M I I : : : : : : I I I : I : I I I I I I : I : I I : I I I I I I ! I I 
Db 167 VKETLMFASEFRLPRSLPKSKKMERVETLIDQLGLRNAADTVIGDEGHRGVSGGERRRVS 226 

Qy 223 IGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDL 282 

| I : : : : I : I I I I I I I I I I I I : I : I I : I : : I : : I : I II : I II 

Db 227 IGIDIIHDP1LLFLDEPTSGLDSTNAFMWQVLKRIAQSGSWIMSIHQPSARIIGLLDR 286 

Qy 283 VLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKA 342 

: : : : : I : : I : : : I : : I I I I : I : I : III:: 
Db 287 LIILSHGKSVFNGSPVSLPSFFSSFGRPIPEKENITEFALDVI RELEGSS 336 

Qy 343 QSLAALFLEKVRDLDDF — LW KAETK DLDEDTCV ESSVTPL 381 

I I II : I I : I I : I I I I : 

Db 337 EGTRDLVEFNEKWQQNQTARATTQSRVSLKEAIAASVSRGKLVSGSSGANPI 388 

Qy 382 DTNCLPS PTKMPGAVQQFTTLI RRQI SNDFRD LPTLLIHGAEACLMSMTI 431 

: I I : : I :| I I I : I::: I I: I: 

Db 389 SMETVSSYANPP — LAETFI LAKRYI KNWI RT PELI GMRI GTVMVTG LLLATVYWR 442 

Qy 432 GFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLY 479 

MM:: I I I II : 

Db 443 LDNTPRGAQERMGFFAFGMSTM FYCCADNIPVFIQERYIFL 483 

Qy 480 YELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLW 539 

| I I I : I II: I I : Ml I I I : I : I : : : 

Db 484 RETTHNAYRTSSYVISHALVSLPQLLALSIAFAATTFWTVGLSGGLESFFYYCLIIYAAF 543 

Qy 540 FCCRIMALAAAALLPTFHMAS FFSNALYNS FYLAGGFMINLS SLWTVPAWI SKVS FLRWC 599 

: : : I : I I : : I : I I I I II : I : I I : : 

Db 544 WSGSSIVTFISGLIPNVMMSYMVTIAYLSYCLLLGGFYINRDRIPLYWIWFHYISLLKYP 603 



Qy 



600 FEGLMKIQF SR RTYKMPLGNLTIAVS 



GDKILSAMELDSY 638 



:| :: :| II : :: I I II I II : I : 

Db 604 YEAVLINEFDDPSRCFVKGVQVFDGTLLAEVSHVMKVKLLDTLSGSLGTKITESTCLRTG 663 

Qy 639 P LYAIYLIVIGLSGG — FMVLYYVSLRF 664 

I I : I I : I I : I : I : I I I 

Db 664 PDLLMQQGITQLSKWDCLWITLAWGLFFRILFYLSLLF 701 



RESULT 15 
T31958 

hypothetical protein F02E11.1 - Caenorhabditis elegans 
C; Species: Caenorhabditis elegans 

C;Date: 29-Oct-1999 #sequence_revision 29-Oct-1999 #text_change 31-Jan-20O0 
C;Accession: T31958 
R;Favello, A.; Scheet, P. 

submitted to the EMBL Data Library, July 1997 

A; Description: The sequence of C. elegans cosmid F02E11. 

A;Reference number: Z21104 

A;Accession: T31958 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A; Molecule type: DNA 
A; Residues: 1-658 <FAV> 

A; Cross-references: EMBL: AF016661; PIDN : AAB66050 . 1 ; GSPDB : GN00020 ; CESP : F02E11 . 1 

A; Experimental source: strain Bristol N2; clone F02E11 

C; Genetics : 

A; Gene: CESP : F02E11 . 1 

A; Map position: 2 

A;Introns: 115/3; 158/3; 214/3; 330/3; 368/2; 448/3; 525/1 
C;Superfamily: fruit fly white protein; ATP-binding cassette homology 

Query Match 16.5%; Score 579; DB 2; Length 658; 

Best Local Similarity 28.3%; Pred. No. 8.5e-38; 



Matches 


180; Conservative 113; Mismatches 257; Indels 86; Gaps 


22; 


Qy 


78 


PSCQNSCELGIQNLSFKV RS GQMLAI I GS SGCGRAS LLDVI TGRGHGGKI KSGQIW 

II || : 1 | : : | | : : | | | | : : | : : : : 1 | : 
PECLAVCALPTSSYQISVSGVAEPGEVIAmGGSGAGKTTLMNIIAHLDTNGVEYLGDVT 


133 


Db 


62 


121 


Qy 


134 


INGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIA 

: || : : 1 : 1 : 1 : 1 : 1 : 1 1 1 1 1 1 : 1 1 1 : | | : : | | I : I : 
VNGKKITKQKMRQMCAYVQQVDLFCGTLTVREQLTYTAHMRMKNATVQ-QKMERVENVLR 


193 


Db 


122 


180 


Qy 


194 


ELRLRQCADTRVG-NMYVRGLSGGERRRVS I GVQLLWNPGI LI LDEPTSGLDSFTAHNLV 
: : | I : I : 1 : : 1 : 1 1 1 : : 1 : : : : 1 : 1 1 1 1 1 1 1 1 1 1 1 : 1 1 : 1 
DMNLTDCQNTLIGI PNRMKGI S I GEKKRLAFACEI LTDPKI LFCDEPTSGLDAFMASEW 


252 


Db 


181 


240 


Qy 


253 


KTLSRIAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIG— YP 

: I || : : : : 1 1 1 1 1 : 1 1 : 1 1 1 : 1 : 1 1 1 = =1 : 1 : 
RALLDLANKGKTIIVVLHQPSSTVFRMFHKVCFMATGKTVYHGAVDRLCPFFDKLGPDFR 


310 


Db 


241 


300 


Qy 


311 


CPRYSNPADFYVDLTSIDRRSREQELATR EKAQSLAALFLEKVRD-LDDFLWK 

1 Mill: II : 1 1 1 1 1 : : 1 : 1 1 : 1 : 1 
VPESYNPADFVMSEISISPET-EQEDVTRIEYLIHEYQNSDIGTQMLKKTRTAVDEFGGY 


362 


Db 


301 


359 


Qy 


363 


AETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGA 


422 


Db 


360 


: : | 1 : I | | | : : | : III I : 1 
GDDEDDGESRYNSTFGT QFEILLKRSLRTTFRDPLLLRVRFA 


401 



Qy 423 EACLMSMT I GFL YFGHGS I QL SFMDTAALLFMIGALIPFNVILDVISKCYSERAMLY 479 

Db 402 QILATAILVGIV NWRVELKGPT I QNLEGVMYNCARDMT FLFYFP SVNVI T S ELPVFL 458 

Qy 480 YELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLH FLLV 535 

I : : I : I I I I I I I I : : : I I I I I : I I : I I : I 

Db 459 REHKSNI YSVEAYFLAKS LAELPQYT I LPMI YGT 1 1 YWMAGLVAS VTS FLVFVFVCI TLT 518 

Qy 536 WLWFCCRIMA — LAAAALLPT FHMAS FFSNALYNS FYLAGGFMINLS S LWTVPAWI S KV 593 

| : | : I I : II I I I : | | | : I : I : I : I 
Db 519 WVAVSIAYVGACI FGDEGLWTF-MPMFVLPML VFGGFYVNANS IPVYYQYV 569 

Qy 594 SFLRWC FEGLMKIQFSRRTYKM PLGNLTI AVSGDKILSAMELDSYP 639 

I I : I III I : : I : II I I I I I : : I : I 

Db 570 SFVSWFKHGFEALEANQW-KEIDKISGCDLINPLNATTTGYCPASDGPGILTRRGIDT-P 627 

Qy 640 LYAIYLI VI GLSGGFMVLY YVS LRFI K 666 

I I I I I : I I I I : : I I I 

Db 628 LYANVLILFMSFFVYRIIGL VAL K I RVRFAK 658 



Search completed: February 27, 2004, 07:18:57 
Job time : 16.9951 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 

Run on: February 27, 2004, 07:17:39 ; Search time 30.2443 Seconds 

(without alignments) 
4698.604 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



US-09-989-981A-8 
3506 

1 MAGKAAEERGLPKGATPQDT FMVL Y YVS LRFIKQKPSQDW 673 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 809742 seqs, 211153259 residues 

Total number of hits satisfying chosen parameters: 



809742 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : Published_Applications_AA: * 

1 : /cgn2_6/ptodata/2/pubpaa/US07_PUBCOMB . pep : * 

2 : /cgn2_6/ptodata/2/pubpaa/PCT_NEW_PUB.pep : * 

3 : / cgn2_6/ptoda t a/ 2 /pubpaa/US 0 6_NEW_PUB . pep : * 

4 : / cgn2_6/ptodata/ 2 /pubpaa/US 0 6_PUBCOMB . pep : * 

5 : /cgn2_6/ptodata/2/pubpaa/US07_NEW_PUB. pep : * 

6: /cgn2_6/ptodata/2/pubpaa/PCTUS_PUBCOMB.pep:* 

7 : /cgn2_6/ptodata/2/pubpaa/US08_NEW_PUB . pep : * 

8 : /cgn2_6/ptodata/2/pubpaa/US08_PUBCOMB.pep :* 

9 : / cgn2___6/ptodata/ 2 /pubpaa/US 0 9A_PUBCOMB . pep : * 

10: /cgn2_6/ptodata/2/pubpaa/US09B_PUBCOMB.pep: * 

11: /cgn2_6/ptodata/2/pubpaa/US09C_PUBCOMB.pep:* 

12: /cgn2_6/ptodata/2/pubpaa/US09_NEW_PUB.pep:* 

13: /cgn2_6/ptodata/2/pubpaa/US10A_PUBCOMB.pep:+ 

14: /cgn2_6/ptodata/2/pubpaa/US10B_PUBCOMB.pep:* 

15: /cgn2_6/ptodata/2/pubpaa/USl0C_PUBCOMB.pep:* 

16: /cgn2_6/ptodata/2/pubpaa/US10_NEW_PUB.pep:* 

17: /cgn2_6/ptodata/2/pubpaa/US60_NEW_PUB.pep:* 

18: / cgn2_6/ptodata/ 2 /pubpaa/US 6 0_PUBCOMB . pep : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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604 
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Sequence 
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Sequence 
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ALIGNMENTS 



RESULT 1 

US-09-989-981A-8 

; Sequence 8, Application US/09989981A 

; Publication No. US20030049730A1 

; GENERAL INFORMATION: 

; APPLICANT: Hobbs, Helen H. 



APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-00732 OUS 
CURRENT APPLICATION NUMBER: US/09/ 989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 8 
LENGTH: 67 3 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE: 

OTHER INFORMATION: human ABCG8 (hABCG8) 
US-09-989-981A-8 

Query Match 100.0%; Score 3506; DB 10; Length 673; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 673; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I 

MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVT>LAS 60 

QVPWFEQLAQFKMPWTS P SCQNS CELGI QNLS FKVRS GQMLAI I GS S GCGRAS LLDVI TG 120 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M I I I I I 
QVPWFEQLAQFKMPWTS PSCQNSCELGIQNLSFKVRSGQMLAI I GSSGCGRASLLDVITG 120 

RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 18 0 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

QAQRD KRVE DVI AE L RL RQ CADT RVGNMYVRGL S GGE RRRVS I GVQ L LWN PGILILDEPT 240 

I M I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I 

QAQRDKRVEDVIAELRLRQCADT RVGNMYVRGL SGGERRRVS I GVQLLWN PGILILDEPT 24 0 

S GLD S FT AHNLVKT L S RLAKGN RLVL I S LHQ PRS DI FRL FDLVLLMT S GT P I YLGAAQHM 300 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

VQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 42 0 

I I I I I I I I M I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 42 0 



Qy 


1 


Db 


1 


Qy 


61 


Db 


61 


Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


361 



Qy 



421 GAEACLMSMT IGFLYFGHGS IQLS FMDTAALLFMI GALI P FNVI LDVI S KCYS ERAMLYY 480 
| | M | | I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 



Db 



421 GAEACLMSMTI GFLYFGHGS I QLS FMDTAALLFMI GAL I P FNVI LDVI SKCYSERAML YY 480 



Qy 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I M I I II I I I I I 

Db 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

Qy 541 CCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVSF 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II II 
Db 541 CCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVS FLRWCF 600 

Qy 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

Qy 661 SLRFIKQKPSQDW 673 

I I I I I I I I I I I I I 
Db 661 SLRFIKQKPSQDW 673 



RESULT 2 
US-10-090-455-7 

; Sequence 7 , Application US/10090455 

; Publication No. US20030027259A1 

; GENERAL INFORMATION: 

; APPLICANT: Chen, Hongyun 

; APPLICANT: Le Bihan, Stephane 

; TITLE OF INVENTION: NOVEL ABCG4 TRANSPORTER AND USES THEREOF 
; FILE REFERENCE: 100103.4 06 

; CURRENT APPLICATION NUMBER: US/10/090, 455 
; CURRENT FILING DATE: 2002-03-01 
; NUMBER OF SEQ ID NOS : 17 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 7 

LENGTH: 673 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-090-455-7 



Query Match 99.9%; Score 3502; DB 14; Length 673; 

Best Local Similarity 99.9%; Pred. No. 0; 

Matches 672; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 



Qy 


1 


MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 


60 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


1 


MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 


60 


Qy 


61 


QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 


120 






1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 




Db 


61 


QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 


120 


Qy 


121 


RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 


180 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


121 


RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 


180 


Qy 


181 


QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 


240 






1 1 1 M 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


181 


QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 


240 



Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


361 


Qy 


421 


Db 


421 


Qy 


481 


Db 


481 


Qy 


541 


Db 


541 


Qv 


601 


Db 


601 


Qy 


661 


Db 


661 



S GL D S FT AHN L VKT L S RLAKGN RL VL ISLHQPRSDI FRL FD LVL LMT S GT P I Y L GAAQHM 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 

S GLD S FTAHNLVKT L S RLAKGN RLVL I S LHQ P RS D I FRLFDLVLLMT S GT P I YLGAAQHM 300 

VQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 
I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
VQYFTAIGYPCPRYSNP7VDFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

WKAETKDLDEDTCVES SVTPLDTNCLPS PTKMPGAVQQFTTLI RRQI SNDFRDLPTLLIH 420 
I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQI SNDFRDLPTLLIH 420 

GAEACLMSMTI GFLYFGHGS IQLS FMDTAALLFMI GALI PFNVILDVI SKCYSERAMLYY 480 
I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I M I I M I I I I I I I 
GAEACLMSMTI GFLYFGHGS I QLS FMDTAALLFMI GALI P FNVI LDVI S KC YS ERAMLYY 4 8 0 

ELEDGLYTTGP YFFAKI LGELPEHCAYI 1 1 YGMPT YWLANLRPGLQPFLLHFLLVWLWF 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I M I 
ELEDGLYTTGPYFFAKI LGELPEHCAYI 1 1 YGMPT YWLANLRPGLQPFLLHFLLVWLWF 54 0 

CCRIMALAAAALLPTFHMAS FFSNALYNSFYLAGGFMINLS SLWTVPAWI SKVS FLRWCF 600 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
CCRIMALAAAALLPTFHMAS FFSNALYNSFYLAGGFMINLS SLWTVPAWI SKVS FLRWCF 600 

EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSVMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

S LRFI KQKP SQDW 673 , 

I I I I I I I I I I I I I 
SLRFIKQKPSQDW 673 



RESULT 3 

US-09-989-981A-4 

Sequence 4, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/ 09/ 98 9 , 98 1A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 4 
LENGTH: 672 



TYPE: PRT 

ORGANISM: Mus musculus 
FEATURE: 

OTHER INFORMATION: mouse ABCG8 (mABCG8) 
US-09-989-981A-4 

Query Match 82.2%; Score 2883.5; DB 10; Length 672; 

Best Local Similarity 81.9%; Pred. No. 1.3e-281; 

Matches 551; Conservative 52; Mismatches 69; Indels 1; Gaps 1; 

MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

Ml || I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I 
MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 60 

QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 12 0 

I I I I I I I I I I I I : I I I I I : I I I I I I : I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

111111:1111111 : I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I 

RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 18 0 

QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 24 0 

I I I I I I I I I I I I II I I I I I I I : I I I I I I I I I : I II I II I I I I I I I I I I I I I I I I I I I I I 
QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 24 0 

SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 
I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I 
S GLD S FTAHNLVTT L S RLAKGNRLVL I S LHQP RS DI FRL FDLVLLMT S GT P I YLGAAQQM 300 

VQYFTAI GYPCPRYSNPADFWDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I : I : I : I I I I I I I II I II I I I I : I I I I 
VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 42 0 

| | | | | : | : I I I I : I : :: I I :: I I : I I I I I I I I I I I I I I I I I I I I 

WKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

GAEACLMSMT I GFL Y FGHGS I QL S FMDTAALL FMI GAL I P FNVI LDVI S KC YS ERAML Y Y 4 80 

1:1 I MM: I I 111:111: II II II I I M I I I I I I I II II II I II : I II : II I : I I II 
GSEACLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYY 47 9 

ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLWLWF 54 0 

I II I II I I I I I II II I II II II I II I : I II II III II II : I I I II I II I I I II 
ELEDGLYTAGPYFFAKI LGELPEHCAYVI I YAMP I YWLTNLRPVPELFLLHFLLVWLWF 539 

CCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCF 600 

II I I I I II : I : II I I I I : I II I II II I I I I II II II Ml I I I I I I I : I I I II I I 
CCRTMAiAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWCF 599 



II I : I I I : I Ml I : : II : : I M M M M I I II I I I I I I M I I : III 



Qy 


1 


Db 


1 


Qy 


61 


Db 


61 


Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 


Qy 


301 


Db 


301 


Qy 


361 


Db 


361 


Qy 


421 


Db 


420 


Qy 


481 


Db 


480 


Qy 


541 


Db 


540 


Qy 


601 


Db 


600 


Qy 


661 


Db 


660 



SLRFIKQKPSQDW 673 

11:1111 III 
SLKLIKQKSIQDW 672 



RESULT 4 
US-10-415-378-9 

Sequence 9, Application US/10415378 
Publication No. US2004 0014945A1 
GENERAL INFORMATION: 
APPLICANT: INCYTE CORPORATION; TANG, Y. Tom 
APPLICANT: YUE, Henry; NGUYEN, Danniel B.; 
APPLICANT: HAFALIA, April J.A. ; ELLIOTT, Vicki S-; 
APPLICANT: LU, Yan; CHAWLA, Narinder K. ; 
APPLICANT: YAO, Monique G. ; BAUGHN, Mariah R. ; 
APPLICANT: GANDHI, Ameena R. ; DING, Li; 

APPLICANT: SAN JANWALA, Madhu Sudan M. ; RAMKUMAR, Jayalaxmi; 
APPLICANT: ARVIZU, Chandra S.; GIETZEN, Kimberly J.; 
APPLICANT: LAL, Preeti G. ; AZIMZAI, Yalda; 
APPLICANT: KHAN, Farrah A. ; THANGAVELU, Kavitha; 
APPLICANT: THORNTON, Michael B.; LU, Dyung Aina M. ; 
APPLICANT: TRIBOULEY, Catherine M. ; WARREN, Bridget A.; 
APPLICANT: I SON, H. Craig; DAS, Debopriya; 
APPLICANT: RAUMANN, Brigette E.; POLICKY, Jennifer L.; 
APPLICANT: KEARNEY, Liam 

TITLE OF INVENTION: TRANSPORTERS AND ION CHANNELS 
FILE REFERENCE: PI-0270 USN 

CURRENT APPLICATION NUMBER: US/ 10/415, 37 8 
CURRENT FILING DATE: 2003-05-07 
PRIOR APPLICATION NUMBER: PCT/US01/4 6055 
PRIOR FILING DATE: 2001-10-27 
PRIOR APPLICATION NUMBER: US 60/250,790 
PRIOR FILING DATE: 2000-12-01 
PRIOR APPLICATION NUMBER: US 60/252,232 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/249,661 
PRIOR FILING DATE: 2000-11-17 
PRIOR APPLICATION NUMBER: US 60/247,673 
PRIOR FILING DATE: 2000-11-09 
PRIOR APPLICATION NUMBER: US 60/245,904 
PRIOR FILING DATE: 2000-11-03 
PRIOR APPLICATION NUMBER: US 60/243,989 
PRIOR FILING DATE: 2000-10-27 
NUMBER OF SEQ ID NOS : 40 
SOFTWARE: PERL Program 
SEQ ID NO 9 
LENGTH: 374 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

NAME /KEY: misc__f eature 

OTHER INFORMATION: Incyte ID No. US2004 0014945A1 6585710CD1 
US-10-415-378-9 



Query Match 55.9%; Score 1961; DB 15 

Best Local Similarity 99.7%; Pred. No. 7.1e-189 
Matches 373; Conservative 0; Mismatches 1 



Length 374; 
Indels 0; Gaps 



QY 



300 MVQYFTAIGYPCPRYSNPADFWDLTSIDRRSREQEIATREKAQSIJ^LLFLEKVRDLDDF 359 
II I I II I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I M I I I I I I I I I 



Db 


1 


Qy 


360 


Db 


61 


Qy 


420 


Db 


121 


Qy 


480 


Db 


181 


Qy 


540 


Db 


241 


Qy 


600 


Db 


301 


Qy 


660 


Db 


361 



1 MVHYFTAI GYPCPRYSNPADFWDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDF 60 

LWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 419 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 

LWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 120 

HGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLY 47 9 

I | | | | I I I I I I I I I I I II I II I I I I I I I I I I I I I I II I I I I I I I I II I I M I I I I I I I I I 
HGAEACLMSMTI GFLYFGHGS IQLS FMDTAALLFMI GALI PFNVI LDVI SKCYSERAMLY 18 0 

YELEDGLYTTGPYFFAKILGELPEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLVWLW 539 
I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I II I I I II I I 
YELEDGLYTTGPYFFAKILGELPEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLVWLW 24 0 

FCCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVSFLRWC 599 

I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
FCCRIMALAAAALLPT FHMAS FFSNAL YNS FYLAGGFMINLS S LWTVPAWI S KVS FLRWC 300 

FEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYY 659 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
FEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYY 360 

VSLRFIKQKPSQDW 673 
I I I I I I I I I I I I I I 
VSLRFIKQKPSQDW 374 



RESULT 5 
US-09-837-992-3 

; Sequence 3, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT : Tularik Inc. 

; TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 

; TITLE OF INVENTION: and Methods of Use 

; FILE REFERENCE: 018781-006020US 

; CURRENT APPLICATION NUMBER: US/ 09/ 837 , 992 

; CURRENT FILING DATE: 2001-04-18 

; PRIOR APPLICATION NUMBER: US 60/198,465 

PRIOR FILING DATE: 2000-04-18 
; PRIOR APPLICATION NUMBER: US 60/204,234 
; PRIOR FILING DATE: 2000-05-15 
; NUMBER OF SEQ ID NOS : 4 5 
; SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 3 

LENGTH: 651 

TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

; OTHER INFORMATION: human sitosterolemia susceptibility gene (SSG) 

; OTHER INFORMATION: amino acid sequence 

US-09-837-992-3 



Query Match 



19.9%; Score 697; DB 9; Length 651; 



Best Local Similarity 28.9%; Pred. No. 7.4e-61; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16; 



Qy 


16 


TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 
1 1 : 1 1 1 II | : : | : : | : : | | : I 1 : : : : 1 
TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR- PWWD-ITSCRQQW 


75 


Db 


8 


64 


Qy 


76 


TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 

I : : : : I 1 1 1 1 : : 1 : 1 1 1 1 1 : : 1 1 1 : : 1 1 1 1 1 : : : : 
TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 


134 


Db 


65 


115 


Qy 


135 


NGQPSSPQLVRKCVT^HVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 

II: : : 1 : : 1 1 : 1 1 : 1 1 1 1 1 1 1 : 1 : : 1 : 1 : 1 1 1 : 1 1 

NGRALRREQFQDCFSYVTjQSDTLLSSLTVRETLHYT^ 


194 


Db 


116 


174 


Qy 


195 


L RLRQ CADT RVGNMYVRGL S GG E RRRVS I GVQ L LWN PGILILDEPTSGLDS FT AHN LVKT 

|| || : 1 1 : 1 : 1 1 1 1 1 1 II 1 1 1 1 : 1 : : : 1 1 1 1 : 1 1 1 1 1 = : 1 
LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWL 


254 


Db 


175 


234 


Qy 


255 


LSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRY 

| | | : | I : I : : : : I I I I I : : 1 : 1 1 1 : : : : 1 1 : 1 1 : : 1 Mill: 
LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 


314 


Db 


235 


294 


Qy 


315 


SNPADFYVDLT S I DRRS REQELATREKAQSLAALF LEKVRDLDDFLWK 

SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL 


362 


Db 


295 


348 


Qy 


363 


AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 

i i it ii _ i.ii i i. ~ 
1 : : 1 II II : MM 1 M : : 

KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 


421 


Db 


349 


390 


Qy 


422 


AEACLMSMTIGFLYFG HGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSER 

: : 1 : : 1 MM 1 1 M | : : I : : : 1 


475 


Db 


391 


LQNLIMGLFLLFFVLRVRSNVLKGAIQ D RVGL L YQ FVGAT P YT GMLNAVN L F P VL R 


446 


Qy 


476 


AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 
i i.iiii i i ii .i. it ii. i 


535 


Db 


447 


1 : 1 : 1 1 II 1 1 1 1 : M II 1 1 - 1 

AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF 


499 


Qy 


536 


WLWFCCRIMALAAAALLPTFHMASFFS NALYNSFYLAG GFM 

Mill : 1 : M: Ml 1 1 : 
GYFSAALLAPHLIGEFLTLVLLGIVQNPNIWSWALLSIAGVLVGSGFL 


577 


Db 


500 


549 


Qy 


578 


INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 




Db 


550 


1 : : | | : | : : 1 1 M M : | : : : : 
RNIQEMPI PFKI ISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTN 597 





RESULT 6 

US-09-989-981A-6 

; Sequence 6, Application US/09989981A 

; Publication No. US20030049730A1 

; GENERAL INFORMATION: 

; APPLICANT: Hobbs, Helen H. 

; APPLICANT: Shan, Bei 

; APPLICANT: Barnes, Robert 

; APPLICANT: Tian, Hui 



APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: PatentlnVer. 2.1 
SEQ ID NO 6 
LENGTH: 651 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE: 

OTHER INFORMATION: human ABCG5 (hABCG5) 
US-09-989-981A-6 

Query Match 19.9%; Score 697; DB 10; Length 651; 

Best Local Similarity 28.9%; Pred. No. 7.4e-61; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16; 

Qy 16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

II : I I I I I | : : | : : | : : | | : I I : : : : I 

Db 8 TPGGSMGLQWRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR- PWWD-ITSCRQQW 64 

Qy 76 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

| : : : : | I I I I : : I : I I I I I : : I I I : : I I I I I : : : : 

Db 65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

Qy 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

II: : : I : : I I : I I : I I I I I II : I : : I : I : I I I : I I 

Db 116 NGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALIAI-RRGNPGSFQKKV^AVMAE 174 

Qy 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 

II II : I I : I : I I I I I I I I I I I I : I : : : I I I I : I I I I I : : I 
Db 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWL 234 

Qy 255 LSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRY 314 

| | | : I I : I : : : : I I I I I :: I : I I I : :: : I hi I : : I Mill: 

Db 235 LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

Qy 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 

I I I I I I : M I I: I M : I M : I : : I : : : : | : : : I 
Db 295 SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL 34 8 

Qy 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

I : : I II II : MM I M : : 

Db 349 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 

Qy 422 AEACLMSMTI GFLYFG HGS IQLS FMDTAALLFMI GALI PFNVI LDVI SKCYSER 475 

: : I : : I MM I I M | : : | : : : I 

Db 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ D RVGL L YQ FVGAT P YT GMLN AVNL F P VL R 44 6 



Qy 



476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 



I ; I -Mil I i ii • i • ii ii- i 

Db 447 AVS DQE S QD GL YQKWQMMLAYALHVL P FS WATMI FS S VC YWT LGLH P EVARF 499 

Qy 536 WLWFCCRIMALAAAALLPTFHMASFFS NALYNSFYLAG GFM 577 

:|!M : I : I:: Ml lh 

Db 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 549 

Qy 578 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 

I: : II :| ::| I I: :| : |::: : 

Db 550 RNIQEMPI PFKI IS YFTFQKYCSEI LWNEFYGLNFTCGSSNVSVTTN 597 



RESULT 7 
US-10-090-455-6 

; Sequence 6, Application US/10090455 
; Publication No. US20030027259A1 
; GENERAL INFORMATION: 

APPLICANT: Chen, Hongyun 
; APPLICANT: Le Bihan, Stephane 

; TITLE OF INVENTION: NOVEL ABCG4 TRANSPORTER AND USES THEREOF 
; FILE REFERENCE: 100103.406 

; CURRENT APPLICATION NUMBER: US/10/090, 455 
; CURRENT FILING DATE: 2002-03-01 
; NUMBER OF SEQ ID NOS : 17 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 6 

LENGTH: 651 

TYPE: PRT 
; ORGANISM: Homo sapiens 
US-10-090-455-6 

Query Match 19.9%; Score 697; DB 14; Length 651; 

Best Local Similarity 28.9%; Pred. No. 7.4e-61; 



Matches 


187; Conservative 124; Mismatches 241; Indels 96; Gaps 


16 


Qy 


16 


TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 
||:||| || | : : | :: I : : I I : 1 |:: : : 1 
TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR- PWWD-ITSCRQQW 


75 


Db 


8 


64 


Qy 


76 


TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 

I : : : : 1 1 1 1 1 : : 1 : 1 1 1 1 1 : : 1 1 1 : : 1 1 1 1 1 : : : : 
TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 


134 


Db 


65 


115 


Qy 


135 


NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 

||: : : 1 : : 1 1 : 1 1 : 1 1 1 1 1 1 1 : 1 : : 1 : 1 : 1 1 1 : 1 1 
NGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKKVEAVMAE 


194 


Db 


116 


174 


Qy 


195 


LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 

II II : 1 1 : 1 : 1 1 1 1 1 1 1 1 1 1 1 1 : 1 : : : 1 1 1 1 : 1 1 1 1 1 : : 1 
LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVL 


254 


Db 


175 


234 


Qy 


255 


LSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRY 

| | | : | | : I : : : : 1 1 1 1 1 : : 1 : 1 1 1 : : : : 1 1 : 1 1 : : 1 Mill: 
LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 


314 


Db 


235 


294 


Qy 


315 


SNPADFYVDLT S I DRRS REQELATREKAQS LAALF LEKVRDLDDFLWK 

III II 1 : 1 M 1 : 1 : 1 : 1 : 1 : 1 : : 1 : : : : | : : : 1 


362 



Db 



295 SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL- 



348 



Qy 



363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 



Db 



i - • i i i i i • i - i i i i • 

349 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 



QY 



422 AEACLMSMTIGFLYFG HGS I QLS FMDTAALLFMI GAL I P FNVI LDVI S KCYSER 475 



Db 



• i • * i i-ii i I i * i . . i . . . i 

391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ DRVG L L YQ FVGAT P YT GMLN AVN L F P VL R 446 



Qy 



Db 



476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

I : I : II I I I I II : I : II I I : I 
447 AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF 499 



Qy 



Db 




Qy 



578 INLS S LWTVPAWI S KVS FLRWC FEGLMKI QFS RRT YKMPLGNLT I AVS 625 



Db 



I . . ii -i • • i i i • -i • i • • • 

550 RNIQEMPIPFKIISYFTFQKYCSEILVWEFYGLNFTCGSSNVSVTTN 597 



RESULT 8 
US-09-837-992-1 

; Sequence 1, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 
; TITLE OF INVENTION: and Methods of Use 

FILE REFERENCE: 018781-006020US 
; CURRENT APPLICATION NUMBER: US/09/837, 992 
; CURRENT FILING DATE: 2001-04-18 

PRIOR APPLICATION NUMBER: US 60/198,465 
; PRIOR FILING DATE: 2000-04-18 
; PRIOR APPLICATION NUMBER: US 60/204,234 
; PRIOR FILING DATE: 2000-05-15 
; NUMBER OF SEQ ID NOS : 45 

SOFTWARE: Patentln Ver. 2.1 
; SEQ ID NO 1 
; LENGTH: 652 
; TYPE: PRT 

; ORGANISM: Mus mus cuius 
FEATURE : 

OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 
; OTHER INFORMATION: amino acid sequence 
US-09-837-992-1 

Query Match 19.6%; Score 688.5; DB 9; Length 652; 

Best Local Similarity 28.1%; Pred. No. 5.3e-60; 

Matches 188; Conservative 125; Mismatches 233; Indels 123; Gaps 16; 
Qy 45 NTLEVRDLN YQVDLASQV- PWFEQLAQFKMPWT S P S CQNS CELGI - QNLS FKVRSGQMLA 102 



Db 


37 


HSLGVLHVSYSV — SNRVGPW WNIKSCQQKWDRQILKDVSLYIESGQIMC 


84 


Qy 


103 


IIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLT 

i i i i i i . . i i i i . i i i . . • • t i ■ i • . i i • i • i t 
| : I I I I I : : 1 II 1 : 1 1 | : : : : | | : 1 - - 1 1 - 1 • II 

ILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLT 


162 


Db 


85 


144 


Qy 


163 


VRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVS 

iiiii . i . i i. i i. it i it . i • i*i 1 1 1 1 1 1 1 
1 || I I : 1 : 1 1 : 1 : 1 : 1 1 1 : 1 1 1 II • 1 • 1 - 1 1 1 1 1 1 1 1 

VRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVS 


222 


Db 


145 


203 


Qy 


223 


IGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDL 

I | I 1 : 1 : : : 1 1 1 1 1 : 1 1 1 1 1 : : 1 1 : 11 = : I : 1 : : : : 1 1 II 1 : : 1 : II 
IAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAEIJ^RDRIVIVTIHQPRSELFQHFDK 


282 


Db 


204 


263 


Qy 

Db 


283 
264 


VLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKA 

i i i . i. .i iiiii .iii i i i . i i i i . i • i i i • i • i •• 
: : : I I : : I : I : : 1 IIIII : 1 I I II ! M 1 II M : I 1 1 : I : I : : 

IAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRV 


342 
323 


Qy 


343 


QSLAALF LEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPT 

iii iii i**ii 


390 


Db 


324 


III : 1 : 1 1 1 : - 1 1 

QMLECAFKESDI YHKILENI ERARYL KTLPM VPFKT 


359 


Qy 


391 


K-MPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGF-- LYFGHGSIQLSFMD 

i ii III 1 1. ... • • 1 • • 1 1 • • » • » 1 

1 II : 1 : 1 1 1 1 : : : : : : | : : 1 1 1 

KD P P GMFGKL GVL LRRVT RN LMRN KQAVI MRLVQN L I MGL FL I FY LL RVQNNT L KGAVQD 


447 


Db 


360 


419 


Qy 


448 


TAALLFMI GAL I PFNVI LDVI SKCYSERAMLYYELEDGLYTTGP YFFAKI LGELPEHCAY 

ii i i . ii. i.iiii tkiit 
||:: I : : | : : : I I : 1 : 1 1 1 1 1 : 1 II 

RVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIA 


507 


Db 


420 


479 


Qy 


508 


IIIYGMPTYWLANLRPGLQPFLLHFLLWLWFCCRrMALAAAALLPTFHMASFFSNAL- 

i. ii ii. i .Mil .i.i 
: | : || I 1 : 1 - 1 1 1 1 1 • 1 

TVI FS SVCYWTLGLYPEVARF GYFSAALLAPHLI GEFLTLVLL 


566 


Db 


480 


522 


Qy 


567 


YNS FYLAGGFMINLS SLWTVPAWI SKVS FLRWCFEGLMKIQFS 

i i < i . • • . i . . i t I • • l 


609 


Db 


523 


: : | | : | : : . . I . . 1 1 1 . - 1 

GIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF- 


581 


Qy 


610 


RRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAI YLIVI 

1 : II 1 : 1 : : 1 : 1 1 : 1 1 : 

YGL NFTCGGSNTSML NHPMCAITQGVQFIEKTCPGATSRFTANFLILY 


648 


Db 


582 


62 9 


Qy 


649 


GLSGGFMVL 657 




Db 


630 


1 ::l 
GFIPALVIL 638 





RESULT 9 

US-09-989-981A-2 

; Sequence 2, Application US/09989981A 

; Publication No. US20030049730A1 

; GENERAL INFORMATION: 

; APPLICANT: Hobbs, Helen H. 

; APPLICANT: Shan, Bei 

; APPLICANT: Barnes, Robert 

; APPLICANT: Tian, Hui 



APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 2 
LENGTH: 652 
TYPE: PRT 

ORGANISM: Mus mus cuius 
FEATURE: 

OTHER INFORMATION: mouse ABCG5 (mABCG5) 
US-09-989-981A-2 

Query Match 19.6%; Score 688.5; DB 10; Length 652; 

Best Local Similarity 28.1%; Pred. No. 5.3e-60; 

Matches 188; Conservative 125; Mismatches 233; Indels 123; Gaps 16; 

Qy 45 NTLEVRDLNYQVDLASQV-PWFEQLAQFKMPWTSPSCQNSCELGI-QNLSFKVRSGQMLA 102 

:: | | : : | I : : : I I I I III : I : : : I : I I I : : 

Db 37 HSLGVLHVSYSV— SNRVGPW WNIKSCQQKWDRQILKDVSLYIESGQIMC 84 

Qy 103 IIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLT 162 

I : I I I I I :: I I I I : I I | : : : : | | : I : : I I : I : I I 

Db 85 ILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSYVLQSDVFLSSLT 144 

Qy 163 VRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVS 222 

I I I I I : I : I I : I : I : I I I : I I I II = I : I : I I I I I I I I 

Db 145 VRETLRYTAMLALCRS-SADFYNKKVEAVMTELSLSHVADQMIGSYNFGGISSGERRRVS 203 

Qy 223 IGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDL 282 

I I I I : I : : : I I I I I : I I I I I : : I I : I I : : I : I : : : : I I I I I : : I : I I 
Db 204 IAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDK 263 

Qy 283 VL LMT S GT P I YL GAAQHMVQ Y FT AI G Y P C P RY S N PAD F YVD LT S I D RRS REQ E LAT RE KA 342 

: : : | I :: I : I : : I II II I : I I I I II : I I I I : I : I I I : I : I : : 
Db 2 64 IAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRV 32 3 

Qy 343 QSLAALF LEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPT 390 

Ml :|: I I I: :| I 

Db 324 QMLECAFKESDI YHKI LENI ERARYL KTLPM VPFKT 359 

Qy 391 K-MPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGF — LYFGHGSIQLSFMD 447 

| || : I : I I I I : : : : : : I : : I I : : : : : I 
Db 360 KDPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQD 419 

Qy 448 TAALLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAY 507 

||:: I : : I : : : I I : I : I I I I I : I I I 

Db 420 RVGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIA 479 



Qy 



508 IIIYGMPTYWLANLRPGLQPFLLHFLLVWLWFCCRIMALAAAALLPTFHMASFFSNAL- 566 



:|: 1111:1 Mill : I : I 

Db 480 TVI FS SVCYWTLGLYPEVARF GYFSAALLAPHLI GEFLTLVLL 522 

Q y 567 YNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFS 609 

: : I I : I : : : : I : : I I I : : I 

Db 523 GIVQNPNIVNSIVALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF- 581 

Qy 610 RRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAI YLIVI 648 

I : I I I :| ::|: II :lh 

Db 582 YGL NFTCGGSNTSML NHPMCAITQGVQFIEKTCPGATSRFTANFLILY 629 

Qy 649 GLSGGFMVL 657 

I : : I 

Db 630 GFIPALVIL 638 



RESULT 10 
US-09-866-866A-14 

Sequence 14, Application US/09866866A 
Patent No. US20020102244A1 
GENERAL INFORMATION: 
APPLICANT: Sorrentino, Brian 
APPLICANT: Schuetz, John 

TITLE OF INVENTION: A Method of Identifying and/or Isolating Stem Cells 
FILE REFERENCE: 1340-1-021CIP2 
CURRENT APPLICATION NUMBER: US/09/ 8 66, 866A 
CURRENT FILING DATE: 2001-08-30 
PRIOR APPLICATION NUMBER: 09/584,586 
PRIOR FILING DATE: 2000-05-31 
PRIOR APPLICATION NUMBER: PCT/US99/ 11825 
PRIOR FILING DATE: 1999-05-27 
PRIOR APPLICATION NUMBER: 60/086,988 
PRIOR FILING DATE: 1998-05-28 
NUMBER OF SEQ ID NOS : 27 
SOFTWARE: Patentln version 3.0 
SEQ ID NO 14 
LENGTH: 657 
TYPE: PRT 

ORGANISM: Mus mus cuius 
US-09-866-866A-14 

Query Match 19.0%; Score 666; DB 9; Length 657; 

Best Local Similarity 28.0%; Pred. No. le-57; 

Matches 178; Conservative 127; Mismatches 255; Indels 76; Gaps 14; 

Qy 91 LSF KVRSGQML AIIGSSGCGRASLLDVITGRG 122 

| | I I I : I I : : I I : I : I I : : I I I I I : I 

Db 37 LSFHHITYRVKVKSGFLVRKTVEKEILSDINGIMKPGLNAILGPTGGGKSSLLDVLAAR- 95 

Qy 123 HGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQA 182 

| I I : I I I I I : I : I I : : : I I I I I I I I : I I I I 
Db 96 KDPKGLSGDVLINGAP-QPAHFKCCSGYWQDDWMGTLTVRENLQFSAALRLPTTMKNH 154 

Qy 183 QRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSG 242 

::::|: :| II I : ||::|| ::||:IMII:I lll::|: -I II llllhl 
Db 155 EKNERINTIIKELGLEKVADSKVGTQFIRGISGGERKRTSIGMELITDPSILFLDEPTTG 214 



Qy 243 LDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPI YLGAAQHMVQ 302 

INN::: I I : : f I : : I : I I I I I I : I I I : I : I I : : I I I : : 
Db 215 LDSSTANAVLLLLKRMSKQGRTI I FSIHQPRYSI FKLFDSLTLLASGKLVFHGPAQKALE 274 

Qy 303 YFTAIGYPCPRYSNPADFYVDLTSIDRR SREQELATREKAQSLAALFLEKVRDLDD 358 

I I : I I I | : | | | I I : : I : : I : I I : : I : : : : I : 

Db 275 YFASAGYHCEPYNNPADFFLDVINGDSSAVMLNREEQDNEANKTEEPSKGEKPVI ENLSE 334 

Qy 359 FLWKA ETK-DLDEDTCVES SVTPLDTNCLPS PTKMPGAVQQFTTLI RRQI SNDFRD 413 

| : I I I : I I : : : I : I : I : I I I : 

Db 335 FYINSAIYGETKAELDQ LPGAQEKKGTSAFKEPVYVTSFCHQLRWIARRSFKNLLGN 391 

Qy 414 LPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYS 473 

: :: : || :|| : | :|| : ::|:| 
Db 392 PQASVAQLIVTVILGLIIGAIYFDLKYDAAGMQNRAGVLFFL TTNQCFS 440 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGE-LPEHCAYIIIYGMPTYWLANLR 522 

I : : : I II I I I I : : : I I : I : I : : I : 

Db 441 SVSAVELFVVEKKLFIHEYISGYYRVSSYFFGKVMSDLLPMRFLPSVI FTCILYFMLGLK 500 

Qy 523 PGLQPFLLHFLLWLWFCCRIMAIJW^LPTFHMASFFSNALYNSFYLAGGFMINLSS 582 

: I : : : I : I I I I I : I : : I I : : I I : 

Db 501 KTVDAFFIl^FTLIMVAYTASSMAL 560 

Qy 583 LWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT IAVSGDKIL — S 631 

: : I : I I : I I : | : : I I : I : I : : I 

Db 561 IGPWLSWLQYFSIPRYGFTALQYNEFLGQEF-CPGFNVTDNSTCVNSYAICTGNEYLINQ 619 

Qy 632 AMELDSYPLYAIYLIVIGLSGGFMVLYYVSLRFIKQ 667 

: | | : | : :: : : I : : I : I I : I : 
Db 620 GI ELS PWGLWKNHVALACMI 1 1 FLTI AYLKLLFLKK 655 



RESULT 11 
US-10-108-605-245 

; Sequence 245, Application US/10108605 

; Publication No. US20020160934A1 

; GENERAL INFORMATION: 

; APPLICANT: Broadus, Julie 

; APPLICANT: St am, Lynn 

; APPLICANT: Bachmann, Jane 

; APPLICANT: Kamdar, Kim 

; TITLE OF INVENTION: NUCLEIC ACID SEQUENCES FROM DROSOPHILA MELANOGASTER THAT 
ENCODE 

; TITLE OF INVENTION: PROTEINS ESSENTIAL FOR LARVAL VIABILITY AND USES THEREOF 
; FILE REFERENCE: 31133B 

; CURRENT APPLICATION NUMBER: US/10/108 , 605 

; CURRENT FILING DATE: 2002-03-27 

; PRIOR APPLICATION NUMBER: US 09/761,142 

; PRIOR FILING DATE: 2001-01-16 

; PRIOR APPLICATION NUMBER: US 60/176,418 

; PRIOR FILING DATE: 2000-01-14 

; NUMBER OF SEQ ID NOS : 361 

; SOFTWARE: Patent In Ver. 2.1 

; SEQ ID NO 245 

; LENGTH: 663 

; TYPE: PRT 



; ORGANISM: Drosophila melanogaster 
US-10-108-605-245 



Query Match 18.7%; Score 656; DB 13; Length 663; 

Best Local Similarity 30.3%; Pred. No. le-56; 

Matches 178; Conservative 113; Mismatches 265; Indels 32; Gaps 10; 

Qy 88 IQNLS FKVRS GQMLAI I GS SGCGRAS LLDVI TGRGHGG — KI KS GQIWINGQPS S PQLVR 145 

: : I : I : : I I : : I I I I I : : I I : : I I II : I I I I : : : 

Db 89 LKNVCGVAYPGELLAVMGSSGAGKTTLLNALAFRSPQGIQVSPSGMRLLNGQPVDAKEMQ 14 8 

Qy 146 KCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRV 205 

I : I : I : : : I I I I I I I : I : I I : II I I : I I I I I : I I : 
Db 149 ARCAYVQQDDLFIGSLTAREHLIFQAMVRMPRHLTYRQRVARVDQVIQELSLSKCQHTII 208 

Qy 206 G-NMYVRGLSGGERRRVS I GVQLLWNPGI LI LDEPTSGLDS FTAHNLVKTLSRLAKGNRL 2 64 

I I : I I I I I I I : I : : : I : I : I I I I I I I I I I I I I I I : : I : I : I : : : 

Db 209 GVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFTAHSWQVLKKLSQKGKT 2 68 

Qy 265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDL 324 

|::::||| l::| III :||| I HI I :|: :l II lllllll : 

Db 269 VILTIHQPSSELFELFDKILLMAEGRVAFLGTPSEAVDFFSYVGAQCPTNYNPADFYVQV 328 

Qy 325 TSIDRRSREQELATREKAQSLAALF-LEKV-RDLDDFLWKAETKDLDEDTCVESSVTPLD 382 

: : : I : : I : : : I : I I I I : : I I I : I : : I I : 

Db 329 LAV VPGREIESRDRIAKICDNFAISKVARDMEQLL ATKNLEK PLE 373 

Qy 383 TNCLPSP TKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGH 438 

| | | | :: I : :: : : : :::: I I :: I 

Db 374 QPENGYT YKATWFMQFRAVLWRSWLSVLKEPLLVKVRLIQTTMVAI LI GLI FLGQ 42 8 

Qy 439 GSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKIL 498 

I : I : : I : : I : I : I I : I II III: 

Db 429 QLTQVGVMNINGAIFLFLTNMTFQNVFATINVFTSELPVFMREARSRLYRCDTYFLGKTI 488 

Qy 499 GELPEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLWLVVFCCRIMALAAAALLPTFHM 558 

Ml ::: I : I I I : I I I I I : : I 

Db 489 AELPLFLTVPLVFTAIAYPMIGLRAGVLHFFNCLALVTLVANVSTSFGYLISCASSSTSM 548 

Qy 559 AS FFSNALYNS FYLAGGFMINLS SLWTVPAWI SKVS FLRWCFEGLMKIQFS RRTYKM 615 

I : IIIIIM I : I : I : I : I : Ml: I : : 

Db 549 ALSVGPPVIIPFLLFGGFFLNSGSVPVYLKWLSYLSWFRYANEGLLINQWADVEPGEISC 608 

Qy 616 PLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYVSLR 663 

II II II : : II:: I I I I I :: I I 

Db 609 TSSNTTCPSSGKVILETLNFSAADLPLDYVGLAILIVSFRVLAYLALR 656 



RESULT 12 
US-09-981-353-35 

Sequence 35, Application US/09981353 
Patent No. US20020160382A1 
GENERAL INFORMATION: 
APPLICANT: Lasek, Amy W. 
APPLICANT: Jones, David A. 

TITLE OF INVENTION: GENES EXPRESSED IN COLON CANCER 
FILE REFERENCE: PA- 003 8 US 



CURRENT APPLICATION NUMBER: US/09/981, 353 
CURRENT FILING DATE: 2001-10-11 
NUMBER OF SEQ ID NOS : 194 
SOFTWARE: PERL Program 
SEQ ID NO 35 
LENGTH: 655 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE: 

NAME/KEY: misc_feature 

OTHER INFORMATION: Incyte ID No. US20020160382A1 5517972CD1 
US-09-981-353-35 

Query Match 18.3%; Score 642.5; DB 9; Length 655; 

Best Local Similarity 27.2%; Pred. No. 2.4e-55; 

Matches 187; Conservative 139; Mismatches 273; Indels 89; Gaps 21; 

Qy 19 DTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSP 78 

:|:| I : : I I I I : : I : I I I =1 
Db 16 NTNG F PAT AS N D L KAFT E G A — VLSFHNICYRVKLKSGF LP 54 

Qy 79 SCQNSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVTTGRGHGGKIKSGQIWINGQ 137 

| : | | I : : : : I : I I : I : I I : : II I I I : I : I I : I I I 

Db 55 -CRKPVEKEI LSNINGIMKPG-LNAI LGPTGGGKS SLLDVLAARKDPSGL- SGDVLINGA 111 

Qy 138 PSSPQLVRKC-VAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELR 196 

I II : | | : : : I I I I I I I I : I I I : : : : : I : I I I I 

Db 112 PRPANF — KCNSGYVVQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELG 169 

Qy 197 LRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLS 256 

I : I I : : I I : : I I : I I I I I : I I I I : : I : : I II I I I I I : I I I I I I : - * I 
Db 170 LDKVADSKVGTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLK 229 

Qy 257 RLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSN 316 

I : : I I : : I : I I I I I I : I II : I : I I : : I M : I I : I I I hi 
Db 230 RMSKQGRTIIFSIHQPRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNN 289 

Qy 317 PADFYVDLTSIDRR S REQELATRE — KAQS LAALFLEKVRDL — DDFLWKAETK — 366 

I I I I : : I : : I : I I : : I : = I h — : : I I I I 

Db 290 PADFFLDIINGDSTAVTVLNREEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYK-ETKAE 348 

Qy 367 DLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 419 

: : I : : : I I : : I I : : 

Db 349 LHQLSGGEKKKKITVFKEISYTTSFC HQLRWVS KRS FKNLLGNPQAS I A 397 

Qy 420 HGAEACLMSMTIGFLYFGHGS IQLS FMDTAALLFMI GALI PFNVI LDVI SKCYS — ■ 473 

: : : I I : I I I : : I : I I : : : I : I 

Db 398 QIIVTWLGLVIGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVSAVE 446 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGE-LPEHCAYIIIYGMPTYWLANLRPGLQPF 528 

|: : :| II II hi : II lh I:: hi I 

Db 447 LFWEKKLFIHEYI SGYYRVSS YFLGKLLSDLLPMRMLPSI I FTCIVYFMLGLKPKADAF 506 

Qy 529 LLHFLLWLWFCCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPA 588 

: : : I : I I I I I I : h : : I : : I h : : : : 

Db 507 FWnviFTLMMVAYSASSMALAIAAGQSWS 566 



Qy 589 WISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT IAVSGDKIL — SAMELDSYP 639 

|: 11:11 :| : : I I I :|:: I ::l • 

Db 567 WLQYFSIPRYGFTALQHNEFLGQNF-CPGLNATGNNPCNYATCTGEEYLVKQGIDLSPWG 625 

Qy 64 0 LYAI YLI VIGLSGGFMVLYYVSLRFI KQ 667 

I : : : : : I : : I : I I : I : 
Db 626 LWKNHVALACMI VI FLTI AYLKLLFLKK 653 



RESULT 13 
US-10-120-687-61 

Sequence 61, Application US/10120687 
Publication No. US20030082155A1 
GENERAL INFORMATION: 
APPLICANT: Massachusetts General Hospital 

TITLE OF INVENTION: Stem Cells of the Islets of Langerhans and Their Use in 
Treating Diabetes 

TITLE OF INVENTION: Mellitus 
FILE REFERENCE: 3284/1235B 

CURRENT APPLICATION NUMBER: US/10/120, 687 
CURRENT FILING DATE: 2002-04-11 
PRIOR APPLICATION NUMBER: US60/169082 
PRIOR FILING DATE: 1999-12-06 
PRIOR APPLICATION NUMBER: US 09/963,875 
PRIOR FILING DATE: 2001-09-25 
PRIOR APPLICATION NUMBER: US 60/215109 
PRIOR FILING DATE: 2000-06-28 
PRIOR APPLICATION NUMBER: US 60/238880 
PRIOR FILING DATE: 2000-10-06 
PRIOR APPLICATION NUMBER: US 09/731261 
PRIOR FILING DATE: 2000-12-06 
NUMBER OF SEQ ID NOS : 61 
SOFTWARE: Patentln version 3.1 
SEQ ID NO 61 
LENGTH: 655 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-120-687-61 

Query Match 18.3%; Score 642.5; DB 14; Length 655; 

Best Local Similarity 27.2%; Pred. No. 2.4e-55; 

Matches 187; Conservative 139; Mismatches 273; Indels 89; Gaps 21 

Qy 19 DTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSP 78 

: I : I I : : I I I I : : I : I I I : I 
Db 16 NTNG- FPATASNDLKAFTEGA — VLSFHNICYRVKLKSGF LP 54 

Qy 79 SCQNSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQ 137 

| : | | I :: : : I : I I : I : I I : : I I I I h I : I I : I I I 

Db 55 -CRKPVEKEILSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGA 111 

Qy 138 PSSPQLVRKC-VAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELR 196 

| || : | | : : : I I I I I I I I : I I I : : : : : I : I I I I 

Db 112 PRPANF — KCNSGYWQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELG 169 

Qy 197 LRQCADTRVGNMYVRGLSGGERRRVS I GVQLLWNPGI LI LDEPTSGLDS FTAHNLVKTLS 256 

| : | | : : | | : : I I : I I I I I : I I I I : : I : : I II I I I II : I I I I I I : : : I 



Db 



170 LDKVADSKVGTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLK 229 



Qy 257 RLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSN 316 

I : : I | : : | : | | | | I I : I I I : I : I I : : I I I : I I : I I I I : I 
Db 230 RMSKQGRTIIFSIHQPRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNN 289 

Qy 317 PADFYVDLTS I DRR SREQELATRE — KAQSLAALFLEKVRDL — DDFLWKAETK — 366 

| | | | : : I : : I : I I : : I : : I I : : : : : I I I I 

Db 290 PADFFLDIINGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYK-ETKAE 348 

Qy 367 DLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 419 

: : I : : : I I : : I I : : 

Db 349 LHQLS GGEKKKKI TVFKEI S YTT S FC HQLRWVS KRS FKNLLGNPQAS I A 397 

Qy 420 HGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYS 473 

: : : I I : I I I : : I : I I : : : I : I 

Db 398 QIIVTWLGLVIGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVSAVE 446 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGE-LPEHCAYI I I YGMPTYWLANLRPGLQPF 528 

I : : : I I I I I I : I : I I I I : I : : I : I I 

Db 447 LFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLPMRMLPSIIFTCIVYFMLGLKPKADAF 506 

Qy 529 LLHFLLWLWFCCRIMAL7\AAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPA 588 

: : : | : I I I I I I : I : : : I : : I I :: : : : 

Db 507 FVMMFTLTMVAYSASSMALAIAAGQSVVSVATLLMT 566 

Qy 589 WISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT IAVSGDKIL — SAMELDSYP 639 

I : I I : I I : | : : I I I : I : : I : : I : 

Db 567 WLQYFS I PRYGFTALQHNEFLGQNF- CPGLNATGNNPCNYATCTGEEYLVKQGI DLS PWG 625 

Qy 64 0 LYAIYLIVIGLSGGFMVLYYVSLRFIKQ 667 

I: :: : : |: : |: I hi: 
Db 62 6 LWKNHVALACMIVIFLTIAYLKLLFLKK 653 



RESULT 14 
US-10-405-806-2 

; Sequence 2, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION: 

; APPLICANT: KOMATANI, HIDEYA 

; APPLICANT : HARA, YOSHIKAZU 

; APPLICANT: KOTANI, HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 

; TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 

; FILE REFERENCE: 234 985US0CONT 

; CURRENT APPLICATION NUMBER: US/10/405, 806 

; CURRENT FILING DATE: 2003-04-03 

; PRIOR APPLICATION NUMBER: PCT/ JP01/ 08 112 

; PRIOR FILING DATE: 2001-09-18 

PRIOR APPLICATION NUMBER: JP2000-303441 
; PRIOR FILING DATE: 2000-10-03 
; NUMBER OF SEQ ID NOS : 17 
; SOFTWARE: Patentln version 3.2 
; SEQ ID NO 2 

LENGTH: 655 

TYPE: PRT 



; ORGANISM: Homo sapiens 
US-10-405-806-2 



Query Match 18.3%; Score 642.5; DB 15; Length 655; 

Best Local Similarity 27.2%; Pred. No. 2.4e-55; 

Matches 187; Conservative 139; Mismatches 273; Indels 89; Gaps 21; 

Qy 19 DTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSP 7 8 

: I : I I : : I I I I :: I : I I I : I 
Db 16 NTNG F PAT AS N D L KAFT E GA — VLSFHNICYRVKLKSGF LP 54 

Qy 79 SCQNSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQ 137 

I : I I I : : : : I : I I : I : I I : : I I I II : I : I I : I I I 

Db 55 -CRKPVEKEILSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGA 111 

Qy 138 PSSPQLVRKC-VAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELR 196 

I || : | | : : : I I I I I I I I : I I I : : : : : I : II I I 

Db 112 PRPANF — KCNSGYWQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELG 169 

Qy 197 LRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLS 256 

I : I I : : I I : : I I : I I I I I : I I I h : h : I I I I I I I h I I I I I h : : I 
Db 17 0 LDKVADSKVGTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLK 22 9 

Qy 257 RLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGi^AQHMVQYFTAIGYPCPRYSN 316 

I : : I I : : I : II I I I I : M I : I : I I : : I I I : I I : I I I hi 

Db 230 RMSKQGRTI I FSIHQPRYSI FKLFDSLTLLASGRLMFHGPAQE7VLGYFESAGYHCEAYNN 289 

Qy 317 PADFYVDLTSIDRR SREQELATRE — KAQSLAALFLEKVRDL — DDFLWKAETK — 366 

||||::|: : I :||:: I : :||: :: : :| III 

Db 290 PADFFLDIINGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYK-ETKAE 348 

Qy 367 DLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 419 

: : I :: : I I : : I I : : 

Db 34 9 LHQLSGGEKKKKITVFKEISYTTSFC HQLRWVSKRSFKNLLGNPQASIA 397 

Qy 420 HGAEACLMSMT I GFLYFGHGS IQLS FMDTAALLFMI GALI P FNVI LDVI S KCYS 473 

: : : I I : I I I : : I : I I : : : I : I 

Db 398 QIIVTWLGLVIGAIYFGLKNDSTGIQNRAGVLFFL — TTNQCFSSVSAVE 446 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGE-LPEHCAYIIIYGMPTYWLANLRPGLQPF 528 

|: : :| II II hi : II lh h: hi I 

Db 447 LFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLPMRMLPSIIFTCIVYFMLGLKPKADAF 506 

Qy 529 LLHFLLWLWFCCRIMALAAAALLPTFHl^SFFSNALYNSFYIAGGFMINLSSLWTVPA 588 

: : : I : I I I I I I : h : : | : : | I : : : : : 

Db 507 FVMMFT LMMVAY S AS SMALAI AAGQ S WS VAT L LMT I C FVFMMI F S G L L VN LT T I AS WL S 566 

Qy 589 WISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT IAVSGDKIL — SAMELDSYP 639 

|: I I: I I :| : : I I I : h : I ::| : 

Db 567 WLQYFSIPRYGFTALQHNEFLGQNF-CPGLNATGNNPCNYATCTGEEYLVKQGIDLSPWG 625 

Qy 640 LYAI YLIVI GLSGGFMVLYYVSLRFI KQ 667 

|: :: : : | : : |: I I : h 
Db 626 LWKNHVALACMIVI FLTIAYLKLLFLKK 653 



RESULT 15 



US-09-961-086-1 

; Sequence 1, Application US/09961086 

; Publication No. US20030036645A1 

; GENERAL INFORMATION: 

; APPLICANT: UNIVERSITY OF MARYLAND, BALTIMORE 
; APPLICANT: ROSS, Douglas D. 
; APPLICANT: DOYLE, L. Austin 
; APPLICANT: ABRUZZO, Lynne 

; TITLE OF INVENTION: BREAST CANCER RESISTANCE PROTEIN (BCRP) AND THE DNA 
; TITLE OF INVENTION: WHICH ENCODES IT 
; FILE REFERENCE: EP19376-019 

; CURRENT APPLICATION NUMBER: US/09/961,086 

; CURRENT FILING DATE: 2001-09-21 

; PRIOR APPLICATION NUMBER: US 60/073,763 

; PRIOR FILING DATE: 1998-02-05 

; PRIOR APPLICATION NUMBER: PCT/US99/ 02577 

; PRIOR FILING DATE: 1999-02-05 

; NUMBER OF SEQ ID NOS : 7 

; SOFTWARE: Patent In Ver. 2.1 

; SEQ ID NO 1 

LENGTH: 655 
; TYPE: PRT 

; ORGANISM: Homo sapiens 
US-09-961-086-1 



Query Match 18.3%; Score 640.5; DB 10; Length 655; 

Best Local Similarity 27.2%; Pred. No. 3.8e-55; 

Matches 187; Conservative 139; Mismatches 273; Indels 89; Gaps 21; 



Qy 


19 


DTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSP 
: 1 : 1 | : : | | 1 1 : : 1 : 1 1 1 : 1 


78 


Db 


16 


NTNG FPATASNDLKAFTEGA — VLSFHNICYRVKLKSGF LP 


54 


Qy 


79 


SCQNSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQ 

I : I I 1 : : : : 1 : 1 1 : 1 : 1 1 : : 1 1 1 1 1 : 1 : 1 1 : 1 1 1 
-CRKPVEKEILSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGA 


137 


Db 


55 


111 


Qy 


138 


PSSPQLVRKC-VAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELR 
I II : | | : : : I I I I I 1 1 1 : II I : : : : : | : 1 1 1 1 
PRPANF — KCNSGYWQDDWMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELG 


196 


Db 


112 


169 


Qy 


197 


LRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLS 
1 : I 1 : : 1 1 : : 1 1 : 1 1 1 1 1 : 1 1 1 1 : : 1 : : 1 II 1 1 1 II : 1 1 1 1 1 1 : : : 1 
LDKVADSKVGTQFI RGVS GGERKRTS I GMELITDP S I LFLDEPTTGLDS STANAVLLLLK 


256 


Db 


170 


229 


Qy 


257 


RLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSN 

1 : : 1 | : : I : I I I 1 1 1 : 1 1 1 : 1 : 1 1 : : 1 1 1 : 1 1 : 1 1 1 1 : 1 
RMSKQGRTIIFSIHQPRYSIFKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNN 


316 


Db 


230 


289 


Qy 


317 


PADFYVDLTSIDRR SREQELATRE — KAQSLAALFLEKVRDL — DDFLWKAETK — 

I I I I : : 1 : : 1 : I I : : 1 : : 1 1 : : : : Mill 
PADFFLDIINGDSTAVALNREEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYK-ETKAE 


366 


Db 


290 


348 


Qy 


367 


DLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 


419 


Db 


349 


: : 1 :: : 1 | : :| I : : 

LHQLSGGEKKKKITVFKEISYTTSFC HQLRWVSKRSFKNLLGNPQASIA 


397 



Qy 420 HGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYS 473 

:: : II :||| : : I :|| : ::|:| 
Db 398 QIIVTWLGLVIGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFSSVSAVE 446 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGE-LPEHCAYI I I YGMPTYWLANLRPGLQPF 528 

I: : :| II II hi : II lh I- hi I 

Db 447 LFWEKKLFIHEYI SGYYRVSS YFLGKLLSDLLPMTMLPS I I FTCIVYFMLGLKPKADAF 506 

Qy 529 LLHFLLWLWFCCRIMALAAAALLPTFHMASFFSNALYNSFYIAGGFMINLSSLWTVPA 588 

: : : I : I I I I I I : h : : I : : I I : : : : : 

Db 507 FVMMFTLMKVAYSAS SMALAI AAGQS WS VATLLMT I CFVFMMI FS GLLVNLTT I ASWLS 566 

Qy 58 9 WISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT IAVSGDKIL — SAMELDSYP 639 

I : I I : I I : | : : I I I : h : I :: I : 

Db 567 WLQYFSIPRYGFTALQHNEFLGQNF-CPGLNATGNNPCNYATCTGEEYLVKQGIDLSPWG 625 

Qy 64 0 LYAIYLIVIGLSGGFMVLYYVSLRFIKQ 667 

I : : : : : I : : h I h I : 
Db 62 6 LWKNHVALACMI VI FLTIAYLKLLFLKK 653 

Search completed: February 27, 2004, 07:34:07 
Job time : 31.2443 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 



Run on: 



February 27, 2004, 06:40:43 ; Search time 10.4203 Seconds 

(without alignments) 
3362.970 Million cell updates/sec 



Title: US-09-98 9-981A-8 

Perfect score: 3506 

Sequence: 1 MAGKAAEERGLPKGATPQDT FMVLYYVSLRFIKQKPSQDW 673 

Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 141681 seqs, 52070155 residues 

Total number of hits satisfying chosen parameters: 141681 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : 



SwissProt 42:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Result 



% 

Query 



No. 


Score 


Match 


Length 


DB 


ID 




Description 


1 


3502 


99. 


9 


673 


1 


ABG8_ 


HUMAN 


Q9h221 


homo sapien 


2 


2873 


81. 


9 


673 


1 


ABG8~ 


"mouse 


Q9dbm0 


mus musculu 


3 


2814.5 


80. 


3 


694 


1 


ABG8] 


_RAT 


P58428 


rattus norv 


4 


713 


20. 


3 


652 


1 


ABG5] 


_RAT 


Q99pe7 


rattus norv 


5 


697 


19. 


9 


651 


1 


ABG5] 


_HUMAN 


Q9h222 


homo sapien 


6 


691.5 


19. 


7 


652 


1 


ABG5~ 


_M0USE 


Q99pe8 


mus musculu 


7 


656 


18. 


7 


687 


1 


whit] 


_DROME 


P10090 


drosophila 


8 


653 


18. 


6 


1294 


1 


YOH5] 


_ YEAST 


Q08234 


saccharomyc 


9 


640.5 


18. 


3 


655 


1 


ABG2] 


_HUMAN 


Q9unq0 


homo sapien 


10 


627 


17. 


9 


695 


1 


whit" 


ANOGA 


Q27256 


anopheles g 


11 


623.5 


17. 


8 


679 


1 


whit] 


_CERCA 


Q17320 


ceratitis c 


12 


621 


17. 


7 


666 


1 


ABGl] 


_M0USE 


Q64343 


mus musculu 


13 


620.5 


17. 


7 


677 


1 


whit] 


_LUCCU 


Q05360 


lucilia cup 


14 


617 


17. 


6 


678 


1 


ABGl" 


HUMAN 


P45844 


homo sapien 


15 


600 


17. 


1 


598 


1 


YPC3~ 


CAEEL 


Q11180 


caenorhabdi 


16 


583 


16. 


6 


709 


1 


whit] 


_ANOAL 


Q16928 


anopheles a 


17 


573.5 


16. 


4 


646 


1 


ABG4] 


_HUMAN 


Q9hl72 


homo sapien 



18 


562.5 


16. 


0 


1049 


1 


ADP1_ 


YEAST 


P25371 


saccharomyc 


19 


552 


15. 


7 


666 


1 


SCRT 


"drome 


P45843 


drosophila 


20 


511 


14. 


6 


610 


1 


YQ5C_ 


[CAEEL 


Q09466 


caenorhabdi 


21 


464 


13. 


2 


1564 


1 


PDRA_ 


"yeast 


P51533 


saccharomyc 


22 


463.5 


13. 


2 


675 


1 


BROW 


[DROME 


P12428 


drosophila 


23 


452.5 


12. 


9 


650 


1 


ABG3_ 


MOUSE 


Q99p81 


mus musculu 


24 


437 


12. 


5 


668 


1 


BROW_ 


"drovi 


Q24739 


drosophila 


25 


434.5 


12. 


4 


1499 


1 


CDR2_ 


"canal 


P78595 


Candida alb 


26 


431 


12. 


3 


1529 


1 


PDRF 


"yeast 


Q04182 


saccharomyc 


27 


424.5 


12. 


1 


1501 


1 


CDR1_ 


[canal 


P43071 


Candida alb 


28 


412 


11. 


8 


1490 


1 


CDR4~ 


[canal 


074676 


Candida alb 


29 


401 


11. 


4 


1333 


1 


YN99_ 


"yeast 


P53756 


saccharomyc 


30 


397 


11. 


3 


1501 


1 


SNQ2_ 


[yeast 


P32568 


saccharomyc 


31 


388.5 


11. 


1 


1501 


1 


CDR3~ 


[canal 


042690 


Candida alb 


32 


388.5 


11. 


1 


1530 


1 


BFRl" 


"SCHPO 


P41820 


schizosacch 


33 


385 


11. 


0 


1511 


1 


PDR5_ 


[yeast 


P33302 


saccharomyc 


34 


349.5 


10. 


0 


1511 


1 


PDRC_ 


[yeast 


Q02785 


saccharomyc 


35 


333 


9. 


5 


1410 


1 


PDRB_ 


"yeast 


P40550 


saccharomyc 


36 


270.5 


7. 


7 


670 


1 


NRTC 


"SYNY3 


P73450 


synechocyst 


37 


252.5 


7. 


2 


894 


1 


YHIH_ 


[ecoli 


P37624 


escherichia 


38 


251 


7. 


2 


371 


1 


Y40S~ 


"rhisn 


P55604 


rhizobium s 


39 


250 


7. 


1 


1704 


1 


ABC3_ 


[human 


Q99758 


homo sapien 


40 


248.5 


7. 


1 


355 


1 


CYSA_ 


_ SYNY3 


P74548 


synechocyst 


41 


248 


7. 


1 


362 


1 


AGLK_ 


"rhime 


Q9z3r9 


rhizobium m 


42 


244 


7. 


0 


659 


1 


NRTC_ 


[SYNP7 


P38045 


synechococc 


43 


243 


6. 


9 


326 


1 


CYSA_ 


PSESM 


Q88as5 


pseudomonas 


44 


241.5 


6. 


9 


332 


1 


SMOK 


[rhosh 


P54933 


rhodobacter 


45 


241 


6. 


9 


236 


1 


CYSA 


^CHLVU 


P56344 


chlorella v 



ALIGNMENTS 



RESULT 1 
ABG 8_HUMAN 

ID ABG8_HUMAN STANDARD; PRT; 673 AA. 

AC Q9H221; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A. , VARIANTS SITOSTEROLEMIA THR-231; GLN-263; ARG-574 

RP AND ARG-596, AND VARIANT CYS-54 . 

RX MEDLINE=20553648; PubMed=11099417 ; 

RA Berge K.E., Tian H., Graf G.A., Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B., Barnes R. , Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2), VARIANTS SITOSTEROLEMIA 



RP HIS-184; THR-231; GLN-263; HIS-405; PRO-501; SER-543; PRO-572; 

RP GLU-574; ARG-574; ARG-596 AND PHE-570 DEL, AND VARIANTS HIS-19; 

RP CYS-54; LYS-238; VAL-259; LYS-400; ARG-575 AND ALA-632. 

RC TISSUE=Liver; 

RX MEDLINE=2 1344 600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H. , 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E. , 

RA Pandya A., Brewer H.B. Jr., Salen G. , Dean M. , Srivastava A.K., 

RA Patel S.B. ; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8 , respectively . " ; 

RL Am. J. Hum, Genet. 69:278-290(2001). 

RN [3] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed=11590207 ; 

RA Schmitz G., Langmann T., Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-sterol absorption and 

CC excretion. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=2; 

CC Name=l; 

CC IsoId=Q9H221-l; Sequence=Displayed; 

CC Name=2 ; 

CC IsoId=Q9H221-2; Sequence=VSP_000052 ; 

CC Note=Minor form detected in approximately 10% of the cDNA 

CC clones; 

CC -!- TISSUE SPECIFICITY: Strongly expressed in the liver, lower levels 
CC in the small intestine and colon. Detectable in a wide variety of 

CC human tissues. 

CC -!- DISEASE: Defects in ABCG8 are a cause of sitosterolemia 

CC [MIM: 210250] ; also known as phytosterolemia or shellfish 

CC sterolemia. It is a rare autosomal recessive disorder 

CC characterized by increased intestinal absorption of all sterols 

CC including cholesterol, plant and shellfish sterols, and decreased 

CC biliary excretion of dietary sterols into bile. Sitosterolemia 

CC patients have hypercholesterolemia, very high levels of plant 

CC sterols in the plasma, and frequently develop tendon and tuberous 

CC xanthomas, accelerated atherosclerosis and premature coronary 

CC artery disease. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 



CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
CC or send an email to license@isb-sib . ch) . 



CC 














JJX\ 


EMBL; AF320294, 




AAG4 0004 


. 1 j 




DR 

Lfr\ 


EMBL; AF324494, 




AAKR4 07 8 


. 1 i 




DR 


EMBL; AF351824, 




AAK84 663 


t i i 




DR 


EMBL; AF351812, 




AAK84663 


. 1j 


; JOINED. 


DR 


EMBL; AF351813 




AAK8 4 663 


• li 


; JOINED. 


UK 


EMBL; AF351814 




ririA u t u u J 


. 1 , 


• JOTNED 

f KJ W-L ill J_i U • 


JJK 


EMBL; AF3518 


15 




AAKft 4 6fi*3 




. JOTNED 




EMBL; AF351816 




AAK84 663 


. li 


: JOINED. 


DR 


EMBL; AF351817 




AAK84663 


. i , 


; JOINED. 


DR 


EMBL; AF3518 


18 




AAK84663 


, l 


; JOINED. 


FlR 
JJK 


EMBL; AF351819 




AAK84663 




• JOINED 


nR 


EMBL; AF351820 




AAK84663 


.1 


; JOINED. 


D r\ 


EMBL; AF351821 




AAK84663 


.1 


; JOINED. 


DR 


EMBL; AF351822 




AAK84663 


.1 


; JOINED. 


TiR 


EMBL; AF351823 




AAK84663 


.1 


; JOINED. 


JJK 


Genew; HGNC: 


13887; ABCG8 






DR 


MIM; 605460; 












DR 


MIM; 210250; 












JJK 


InterPro; IPR003439; ABC_ 


transporter. 


UK 


Pfam; PF00005; 


ABC tran; 


1 




JJK 


ProDom; PD000006; ABC_transporter ; 1. 


np 

JJK 


PROSITE; PS00211; AB C_T RAN S PORT ER_1 ; 1. 


np 

JJK 


PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 


J\V¥ 


Glycoprotein; Transmembrane; Transport; Alternative splicing; 


J\W 


Polymorphism; Disease mutation. 


r 1 


DOMAIN 


1 


416 




CYTOPLASMIC (POTENTIAL). 


J? i 


TRANSMEM 


417 


437 




1 (POTENTIAL) . 


FT 

r i 


DOMAIN 


438 


447 




EXTRACELLULAR (POTENTIAL) . 


C 1 


TRANSMEM 


448 


468 




2 (POTENTIAL). 




DOMAIN 


469 


492 




CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


493 


513 




3 (POTENTIAL) . 


FT 


DOMAIN 


514 


531 




EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


532 


552 




4 (POTENTIAL) . 


FT 


DOMAIN 


553 


569 




CYTOPLASMIC (POTENTIAL) . 


FT 

£ 1 


TRANSMEM 


570 


590 




5 (POTENTIAL) . 


FT 


DOMAIN 


591 


639 




EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


640 


660 




6 (POTENTIAL) . 


FT 


DOMAIN 


661 


673 




CYTOPLASMIC (POTENTIAL) . 


FT 


CARBOHYD 


619 


619 




N-LINKED ( GLCNAC . . .) (POTENTIAL) 


FT 


VARSPLIC 


376 


376 




Missing (in isoform 2) . 


FT 












/FTId=VSP_000052 . 


FT 

£ 1 


VARIANT 


19 


19 




D -> H. 


FT 












/FTId=VAR_012250. 


FT 

£ J- 


VARIANT 


54 


54 




Y -> C. 


FT 












/FTId=VAR_012251 . 


FT 


VARIANT 


184 


184 




R -> H (in sitosterolemia) . 


FT 












/FTId=VAR_012252. 


FT 


VARIANT 


231 


231 




P -> T (in sitosterolemia) . 


FT 












/FTId=VAR_012253. 


FT 


VARIANT 


238 


238 




E -> K. 


FT 












/FTId=VAR 012254. 


FT 


VARIANT 


259 


259 




A -> V. 


FT 












/FTId=VAR_0122 55. 


FT 


VARIANT 


263 


263 




R -> Q (in sitosterolemia) . 



FT /FT I d= VAR_ 012256. 

FT VARIANT 400 400 T -> K. 

FT /FTId=VAR_012257 . 

FT VARIANT 405 405 R -> H (in sitosterolemia) . 

FT /FTId=VAR_012258 . 

FT VARIANT 501 501 L -> P (in sitosterolemia) . 

FT /FTId=VAR_012259 ♦ 

FT VARIANT 543 543 R -> S (in sitosterolemia) . 

FT /FTId=VAR_012260. 

FT VARIANT 57 0 570 Missing (in sitosterolemia) . 

FT /FTId=VAR_012261 . 

FT VARIANT 572 572 L -> P (in sitosterolemia). 

FT /FTId=VAR_012262 . 

FT VARIANT 574 574 G -> E (in sitosterolemia) . 

FT . /FTId=VAR_0122 63. 

FT VARIANT 574 574 G -> R (in sitosterolemia) . 

FT /FTId=VAR_012264. 

FT VARIANT 575 575 G -> R. 

FT /FTId=VAR_012265. 

FT VARIANT 596 596 L -> R (in sitosterolemia) . 

FT /FTId=VAR_0122 66. 

FT VARIANT 632 632 V -> A. 

FT /FTId=VAR_012267 . 

SQ SEQUENCE 673 AA; 75678 MW; 594AFD1D6C1BB50F CRC64; 

Query Match 99.9%; Score 3502; DB 1; Length 673; 

Best Local Similarity 99.9%; Pred. No. 1.6e-251; 

Matches 672; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

Qy 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I M I I I I I I I I I I I II I I I I I I I II I I I I I 
Db 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

Qy 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

Qy 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 18 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

Qy 181 QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 240 

I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 QAQRDKRVEDVIAELRLRQCADTRVGNMWRGLSGGERRRVSIGVQLLWNPGILILDEPT 240 

Qy 241 SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 241 SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

Qy 301 VQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 301 VQYFTAIGYPCPRYSNPADFWDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

Qy 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 

Db 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 



Qy 421 GAEACLMSMT I GFL YFGHGS I QLS FMDTAALLFMI GALI P FNVI LDVI S KC YS ERAMLYY 480 

| | | | | M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I 
Db 421 GAEACLMSMT I GFL YFGHGS I QLS FMDTAALLFMI GALI P FNVI LDVI S KCYS ERAMLYY 480 

Qy 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 4 81 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

Qy 541 CCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCF 600 

I I I I I I I I I II I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 CCRIMALAAAALLPTFHMASFFSNALYNSFYIAGGFMINLSSLWTVPAWISKVSFLRWCF 600 

Qy 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSVMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

Qy 661 SLRFIKQKPSQDW 673 

I I II I I I I I I I I I 
Db 661 SLRFIKQKPSQDW 673 



RESULT 2 
ABG8_MOUSE 

ID ABG8_MOUSE STANDARD; PRT; 673 AA. 

AC Q9DBM0; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2). 

RC STRAIN=C57BL/6; TISSUE=Liver ; 

RX MEDLINE=2 1344 600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E. , 

RA Pandya A. , Brewer H.B. Jr., Salen G., Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORM 1). 

RC STRAIN=C57BL/6J; TISSUE=Liver ; 

RX MEDLINE-21085660; PubMed=11217 851 ; 

RA Kawai J., Shinagawa A., Shibata K. , Yoshino M. , Itoh M. , Ishii Y., 

RA Arakawa T., Hara A., Fukunishi Y. , Konno H., Adachi J., Fukuda S., 

RA Aizawa K. , Izawa M. , Nishi K. , Kiyosawa H., Kondo S., Yamanaka I., 

RA Saito T., Okazaki Y. f Gojobori T., Bono H., Kasukawa T . , Saito R. , 

RA Kadota K., Matsuda H.A., Ashburner M. , Batalov S., Casavant T., 

RA Fleischmann W., Gaasterland T., Gissi C, King B. , Kochiwa H., 

RA Kuehl P., Lewis S., Matsuo Y., Nikaido I., Pesole G. , Quackenbush J., 



RA Schriml L.M., Staubli F., Suzuki R. , Tomita M. , Wagner L. , Washio T . , 

RA Sakai K., Okido T., Furuno M. , Aono H., Baldarelli R. , Barsh G., 

RA Blake J., Boffelli D . , Bojunga N., Carninci P., de Bonaldo M.F., 

RA Brownstein M.J., Bult C, Fletcher C, Fujita M. , Gariboldi M. , 

RA Gustincich S., Hill D., Hofmann M. , Hume D.A. , Kamiya M. , Lee N.H., 

RA Lyons P., Marchionni L., Mashima J., Mazzarelli J., Mombaerts P., 

RA Nordone P., Ring B., Ringwald M. , Rodriguez I., Sakamoto N. , 

RA Sasaki H., Sato K., Schoenbach C, Seya T. , Shibata Y., Storch K.-F., 

RA Suzuki H., Toyo-oka K., Wang K.H., Weitz C, Whittaker C. , Wilming L., 

RA Wynshaw-Boris A., Yoshida K. , Hasegawa Y., Kawaji H. f Kohtsuki S. f 

RA Hayashizaki Y. ; 

RT "Functional annotation of a full-length mouse cDNA collection,"; 

RL Nature 409:685-690(2001). 

RN [3] 

RP TISSUE SPECIFICITY, AND INDUCTION. 

RX MEDLINE=2055364 8; PubMed=11099417 ; 

RA Berge K.E., Tian H., Graf G.A., Yu L. , Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B. f Barnes R. , Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-s terol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=2; 

CC Name=l; 

CC IsoId=Q9DBM0-l; Sequence=Displayed; 

CC Name=2 ; 

CC IsoId=Q9DBM0-2; Sequence=VSP_000053 ; 

CC Note=No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Expressed in the intestine and, at lower 

CC level, in the liver. 

CC -!- INDUCTION: Upregulated by cholesterol feeding. Possibly mediated 
CC by the liver X receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF324495; AAK84079.1; -. 

DR EMBL; AK004871; BAB23630.1; -. 

DR MGD; MGI: 1914720; Abcg8 . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 



DR 


ProDom; PD000006; 


ABC_transporter ; 1 . 


DR 


PROSITE; 


PS00211; 


AB C_T RAN S PORT ER__1 ; 1 . 


DR 


PROSITE; 


PS50893; 


ABC TRANS PORTER_2; 1. 


KW 


Glycoprotein; Transmembrane; 


Transport; Alternative splicing. 


FT 


DOMAIN 


1 


413 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


414 


434 


1 (POTENTIAL) . 


FT 


DOMAIN 


435 


447 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


448 


468 


2 (POTENTIAL) . 


FT 


DOMAIN 


469 


496 


CYTOPLASMIC (POTENTIAL). 


FT 


TRANSMEM 


497 


517 


3 (POTENTIAL) . 


FT 


DOMAIN 


518 


526 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


527 


547 


4 (POTENTIAL) . 


FT 


DOMAIN 


548 


569 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


570 


590 


5 (POTENTIAL) . 


FT 


DOMAIN 


591 


639 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


640 


660 


6 (POTENTIAL) . 


FT 


DOMAIN 


661 


673 


CYTOPLASMIC ( POTENTIAL) . 


FT 


CARBOHYD 


619 


619 


N-LINKED (GLCNAC. . .) (POTENTIAL 


FT 


VARSPLIC 


377 


377 


Missing (in isoform 2) . 


FT 








/FTId=VSP 000053. 


SQ 


SEQUENCE 


673 AA 


; 75995 MW 


; 78012611A5DF2589 CRC64; 



Query Match 81.9%; Score 2873; DB 1; Length 673; 

Best Local Similarity 81.8%; Pred. No. 5.8e-205; 

Matches 551; Conservative 52; Mismatches 69; Indels 2; Gaps 2 

Qy 1 MAGKAAEERGLPKGATPQDTS-GLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLA 59 

III II I I II I I I I I M I I I I I I I II I I I I I I I I I I I I I I I I I I : I 
Db 1 MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIA 60 

Qy 60 SQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVIT 119 

I I I I I I I I I I I I I : I I I I I : I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 61 SQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVIT 120 

Qy 120 GRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTF 179 

I I I I I I I : I I I I I I I I I I I I : I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 GRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTF 180 

Qy 180 S QAQRD KRVEDVI AE L RL RQ CADT RVGNMYVRGL S GGE RRRVS I GVQ L LWN PGILILDEP 239 

I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 SQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVS I GVQLLWN PGILILDEP 240 

Qy 240 TSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQH 2 99 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 TSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQ 300 

Qy 300 MVQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQEIATREKAQSLAALFLEKVRDLDDF 359 

I I I I I I : I I : I I I II I I I I I I I I I I I I I I I I I : I : I : I I I I I I I I I I I I I I I I : III 
Db 301 MVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 360 

Qy 360 LWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 419 

I I I I I I : I : I I 11:1 : : : I I : : I I : I I I I I I I I I I I I I I I I I I I 

Db 361 LWKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLI 419 

Qy 420 HGAEACLMSMT I GFL YFGHGS I QLS FMDTAALLFMI GALI P FNVI LDVI S KC YSERAML Y 479 

I I : I I I I I I : I I I I I : I I I : I I I I I I I I M I I I I I I I I I I II I I I I : I I I : I I I : I I I 
Db 420 HGSEACLMS LI I GFLYYGHGAKQLS FMDTAALLFMI GALI P FN VI LDVVSKCHSERSMLY 479 



Qy 480 YELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLW 539 

I I I I I I I I I I I I I I II I I I I I I I I I I I : I I I II III I I I I : I I I I I I I I I I I I 
Db 480 YELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLW 539 

Qy 540 FCCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKV^ 599 

I I I I I I I I I : I : I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I 
Db 540 FCCRTMTUoTVASAMLPTFHMSSFFCNALYNSFYLTAGmiNLDNLWIVPAWISKLSFLRWC 599 

Qy 600 FEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYY 659 

I I I I : I I I : I : I I I : : II : : I I I : I : I : I I I I I I I I I I I : I I I : II I 
Db 600 FSGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYY 659 

Qy 660 VSLRFIKQKPSQDW 673 

: I I : I I I I III 

Db 660 LSLKLIKQKSIQDW 673 



RESULT 3 
ABG8_RAT 

ID ABG8_RAT STANDARD ; PRT; 694 AA. 

AC P58428; Q8CIQ5; Q923R7; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 15-MAR-2004 (Rel. 43, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2). 

RC STRAIN=Sprague-Dawley; 

RX MEDLINE=21344600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E. , 

RA Pandya A., Brewer H.B. Jr., Salen G. , Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 

RL Am. J. Hum. Genet. 69:27 8-290(2001). 

RN [2] 

RP REVISIONS TO 3-4. 

RA Lu K., Yu H., Lee M.-H., Patel S.B.; 

RL Submitted (AUG-2002) to the EMBL/GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 3), AND TISSUE SPECIFICITY. 

RC STRAIN=GH, SHR, SHRSP, Sprague-Dawley, Wistar, Wistar Kyoto, and WKA; 

RC TISSUE=Intestine, and Liver; 

RX PubMed=12 7 83625; 

RA Yu H., Pandit B., Klett E., Lee M.-H., Lu K., Helou K. , Ikeda I., 

RA Egashira N., Sato M. , Klein R. , Batta A., Salen G. , Patel S.B.; 

RT "The rat STSL locus: characterization, chromosomal assignment, and 

RT genetic variations in sitosterolemic hypertensive rats."; 

RL BMC Cardiovasc. Disord. 3:4-4(2003). 



CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-s terol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isof orms=3; 

CC Name =3 ; 

CC IsoId=P58428-3; Sequence^Displayed; 

CC Name=l; 

CC IsoId=P58428-l; Sequence=VSP_008767 ; 

CC Name=2 ; 

CC IsoId-P58428-2; Sequence=VSP_008767 , VSP_000054; 

CC Note=No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Highest expression in liver, with lower levels 
CC in small intestine and colon. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF351785; AAK84831.2; -. 

DR EMBL; AY145899; AAN64276.1; 

DR EMBL; AF404109; AAK85393.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER__1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW Glycoprotein; Transmembrane; Transport; Alternative splicing. 



FT 


DOMAIN 


1 


434 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


435 


455 


1 (POTENTIAL) . 


FT 


DOMAIN 


456 


468 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


469 


489 


2 (POTENTIAL). 


FT 


DOMAIN 


490 


517 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


518 


538 


3 (POTENTIAL) . 


FT 


DOMAIN 


539 


547 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


548 


568 


4 (POTENTIAL). 


FT 


DOMAIN 


569 


590 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


591 


611 


5 (POTENTIAL) . 


FT 


DOMAIN 


612 


650 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


651 


671 


6 (POTENTIAL). 


FT 


DOMAIN 


672 


694 


CYTOPLASMIC (POTENTIAL) . 


FT 


CARBOHYD 


640 


640 


N-LINKED (GLCNAC. . .) (POTENTIAL). 


FT 


VARSPLIC 


56 


77 


Missing (in isoform 1 and isoform 2) 



FT /FTId=VSP_008767 . 

FT VARSPLIC 398 398 Missing (in isoform 2) . 

FT /FTId=VSP_000054 . 

FT CONFLICT 3 4 EK -> QT (IN REF. 3) . 

SQ SEQUENCE 694 AA; 78236 MW; 67F67C195F417587 CRC64; 

Query Match 80.3%; Score 2814.5; DB 1; Length 694; 

Best Local Similarity 77.4%; Pred. No. 1.3e-200; 

Matches 538; Conservative 57; Mismatches 77; Indels 23; Gaps 2; 

Qy 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQ 55 . 

Ml M | I I I I I I I : I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 1 MAEKTKEETQLWNGTVLQDASSLQDSVFSSESDNSLYFTYSGQSNTLEVRDLTYQGGTCL 60 

Qy 56 VDLASQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSG 98 

I I : I I I I I I I I I I I I I I : I I I I : I : I I I : II I I I I I I I 
D b 61 RSWGQEDPHMSLGLSESVDMASQVPWFEQLAQFKLPWRSRGSQDSWDLGIRNLSFKVRSG 120 

Qy 99 QMLAI I GS SGCGRASLLDVI TGRGHGGKI KS GQI WINGQP S S PQLVRKCVAHVRQHNQLL 158 

I | | | I I I I : I I I I I : I I I I I I II I I I I : I I I I II I I I I II : I I I :: I I I I I I I I : I I I 
Db 121 QMLAIIGSAGCGRATLLDVITGRDHGGKMKSGQIWINGQPSTPQLIQKCVT^VRQQDQLL 180 

Qy 159 PNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVI7VELRLRQCADTRVGNMYVRGLSGGER 218 

I I I I I I I I I I I I I I I I I : I I I I I I I I I M M I II I I I I I I I I : I I I I I 1111:11111 
Db 181 PNLTVRETLTFIAQMRLPKTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGER 24 0 

Qy 219 RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFR 278 

I I I I I I I I I I I I I I I II I I I II I I I I II I I II M : I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 RRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVRTLSRLAKGNRLVLISLHQPRSDIFR 300 

Qy 279 LFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELAT 338 

M I I I I I I I I I I I I I I I I I I I I I I I I : I I I I M I I I I I I I I I I I I I I I I I I I : I I I : I I 
Db 301 LFDLVLLMTSGTP I YLGVAQHMVQYFT S I GYPCPRYSN PADFYVDLT S I DRRSKEQEVAT 360 

Qy 339 REKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQ 398 

I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I : : : I I : I I 

Db 361 MEKARLLAALFLEKVQGFDDFLWKAEAKSLDTGTYAVSQTLTQDTNC-GTAAELPGMIQQ 419 

Qy 399 FTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGAL 458 

I M I I I I I I I I I I I I I I I I I I I I I : llllhll I I I I I I I I I I I I I M 

Db 420 FTTLIRRQISNDFRDLPTLFIHGAEACLMSLIIGFLYYGHADKPLSFMDMAALLFMIGAL 47 9 

Qy 459 I PFNVI LDVI SKCYSERAMLYYELEDGLYTTGPYFFAKI LGELPEHCAYI I I YGMPT YWL 518 

I | | I I I I I I : I M : I I I :: II I I I I I I I I I I I I II I I : I II I I I I I I I : I I I I I I I I I 
Db 480 IPFNVILDWSKCHSERSLLYYELEDGLYTAGPYFFAKVLGELPEHCAYVIIYGMPIYWL 539 

Qy 519 ANLRPGLQPFLLHFLLWLWFCCRIM7VLAA7^LPTFHMASFFSNALYNSFYLAGGm^ 578 

I I I I I : II I I I : I : I I I I II M M I I I : I : I I I I I I : I I I I I I I I I I I I I I I 
Db 540 TNLRPGPELFLLHFMLLWLWFCCRTMAIAASAMLPTFHMSSFCCN7VLYNSFYLTAGFMI 599 

Qy 579 NLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSY 638 

I I : : I I I I I I I I I : I I I I I I I I I I : I I I : I : I I I I =1 II : : : I I : I : I : 
Db 600 NLNNLWIVPAWISKMSFLRWCFSGLMQIQFNGHIYTTQIGNLTFSVPGDAMVTAMDLNSH 659 

Qy 639 PLYAIYLIVIGLSGGFMVLYYVSLRFIKQKPSQDW 673 

I I II I I I I I I I : I II: I I I : I I : I I I I I III 

Db 660 PLYAIYLIVIGISCGFLSLYYLSLKFIKQKSIQDW 694 



RESULT 4 
ABG5 RAT 



ID ABG5_RAT STANDARD; PRT; 652 AA. 

AC Q99PE7; Q8CIQ4; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 10-OCT-2003 (Rel. 42, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

t OS Rattus norvegicus (Rat). 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Sprague-Dawley; TISSUE=Small intestine; 

RX MEDLINE-20578753; PubMed=11138003 ; 

RA Lee M.-H., Lu K. , Hazard S., Yu H., Shulenin S., Hidaka H., Kojima H., 

RA Allikmets R. , Sakuma N . , Pegoraro R. , Srivastava A.K., Salen G. , 

RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [2] 

RP REVISION TO 2. 

RA Lu K., Lee M.-H., Patel S.B.; 

RL Submitted (AUG-2 002) to the EMBL/ GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A., TISSUE SPECIFICITY, AND VARIANT CYS-583. 

RC STRAIN=GH, SHR, SHRSP, Sprague-Dawley, Wistar, Wistar Kyoto, and WKA; 

RX PubMed= 12783625; 

RA Yu H., Pandit B., Klett E-, Lee M.H., Lu K. , Helou K., Ikeda I., 

RA Egashira N., Sato M. , Klein R. , Batta A., Salen G. , Patel S.B.; 

RT "The rat STSL locus: characterization, chromosomal assignment, and 

RT genetic variations in sitosterolemic hypertensive rats."; 

RL BMC Cardiovasc. Disord. 3:4-4(2003). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Expressed only in liver and intestine. 

CC -!- POLYMORPHISM: The polymorphism at position 583 is found in strains 

CC SHR, SHRSP and Wistar Kyoto which are both hypertensive and 

CC sitosterolemic. Strains which are hypertensive but not 

CC sitosterolemic do not contain a polymorphism at this position. 

cc _!_ SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 

CC subfamily. 

CC 7"" 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 



CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF312714; AAG53098.3; -. 

DR EMBL; AY145899; AAN64275.1; 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC__T RAN S PORT ER__1; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Glycoprotein; Transmembrane; Transport; Polymorphism. 



FT 
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CYTOPLASMIC (POTENTIAL) . 
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1 (POTENTIAL). 
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EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 
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2 (POTENTIAL). 


FT 


DOMAIN 
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463 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


464 


484 


3 (POTENTIAL). 


FT 


DOMAIN 


485 


504 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


505 


525 


4 (POTENTIAL). 


FT 


DOMAIN 


526 


529 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


530 


550 


5 (POTENTIAL) . 


FT 


DOMAIN 


551 


624 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


625 


645 


6 (POTENTIAL). 


FT 


DOMAIN 


646 


652 


CYTOPLASMIC ( POTENTIAL) . 


FT 


NP_BIND 


87 


94 


ATP (POTENTIAL) . 


FT 


CARBOHYD 


585 


585 


N-LINKED (GLCNAC. . .) (POTENTIAL). 


FT 


CARBOHYD 


592 


592 


N-LINKED (GLCNAC. . .) (POTENTIAL). 


FT 


VARIANT 


583 


583 


G -> C (in strains SHR, SHRSP and Wistar 


FT 








Kyoto) . 


SQ 


SEQUENCE 


652 AA; 


73372 


MW; 49FEF7372269299D CRC64; 



Query Match 20.3%; Score 713; DB 1; Length 652; 

Best Local Similarity 30.0%; Pred. No. 4.1e-45; 

Matches 190; Conservative 115; Mismatches 232; Indels 96; Gaps 15; 

PKGAT-PQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQV- PWFEQLA 69 

I : I I I : II II : I : II : : I I II : : : : I I I 

PEGARGPHNNRGSQ SSLEEGSV — TGSEARHSLGV — LNVSFSVSNRVGPW 55 

QFKMPWTSPSCQNSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIK 12 8 
I | I I : I : : : I : I I I : I : I I I I I : : I I I I : I I 



Qy 


12 


Db 


9 


Qy 


70 


Db 


56 


Qy 


129 


Db 


111 


Qy 


189 


Db 


170 


Qy 


249 



I I :: : I : I : I I I I I I I : I : I I : I II 



I | : | I I II : II I : I I I I I I I I I I I I : I : : : I I I I I : I I I I 



: :: I I I I : I I : I : : : : I I I I I : : I II : : : I I 



Db 



230 NHIVLLLVELARRNRIVIVTIHQPRSELFHHFDKIAI LTYGELVFCGTPEEMLGFFNNCG 289 



Qy 309 YPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDL 356 

II I I : I I I I I I : I I I I : I : I I I : I : I : : I I : I : I : I I 

Db 290 YPCPEHSNPFDFYMDLTSVDTQSREREIETYKRVQMLESAFRQSDICHKILENIERTRHL 34 9 

Qy 357 DDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLP 415 

I : : I II II : I : I I I I : 
Db 350 KTLPM VP FKT KN P P GMFC KL GVLLRRVT RN LMRN KQ 385 

Qy 416 TLLIHGAEACLMSMTIGF — LYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYS 473 

: : : : : | : : I I : : : : I I I : : I : : I : : : 

Db 386 WIMRLVQNL I MGL FL I F YLL RVQNNML KGAVQD RVGLL YQLVGAT P YT GMLNAVN L F PM 445 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFL 533 

II: I : I I I I I : I I I • I : I I I I : I 

Db 446 LRAVSDQESQDGLYQKWQMLLAYVLHALPFSIVATVIFSSVCYWTLGLYPEVARF 500 

Qy 534 LWLWFCCRIMALAAAALLPTFHMASFFSNAL YNSFYLAGG 575 

: I I I I : I : I : : I 

Db 501 GYFSAALIAPHLIGEFLTLVLLGMVQNPNIWSIVALLSISGLLIGSG 54 8 

Qy 576 FMINLSSLWTVPAWISKVSFLRWCFEGLMKIQF 608 

| : I : : : : I : : I I I : : I 

Db 549 FI RNI EEMPI PLKI LGYFTFQKYCCEI LWNEF 581 



RESULT 5 
ABG5_HUMAN 

ID ABG5_HUMAN STANDARD; PRT; 651 AA. 

AC Q9H222; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxI D=9 60 6 ; 

RN [1] 

RP SEQUENCE FROM N.A., AND VARIANT GLU-604. 

RC TISSUE=Liver; 

RX MEDLINE=20553648; PubMed=11099417 ; 

RA Berge K.E., Tian H., Graf G.A., Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B., Barnes R., Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 
RN [2] 

RP SEQUENCE FROM N.A., VARIANTS SITOSTEROLEMIA HIS-389; HIS-419 AND 

RP PRO-419, AND VARIANT GLU-604. 

RC TISSUE=Liver; 

RX MEDLINE=20578753; PubMed-1 1138 003 ; 

RA Lee M.-H., Lu K. , Hazard S., Yu H . , Shulenin S., Hidaka H., Kojima H., 

RA Allikmets R., Sakuma N., Pegoraro R. , Srivastava A.K., Salen G., 

RA Dean M., Patel S.B.; 



RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [3] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed=115902 07 ; 

RA Schmitz G., Langmann T., Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

RN [4] 

RP VARIANTS SITOSTEROLEMIA GLN-146; HIS-389; PRO-419; HIS-419 AND 

RP SER-550, AND VARIANT GLU-604. 

RX MEDLINE=21344600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A. F.H. , Mietinnen T., Bjorkhem I., Bruckert E. , 

RA Pandya A., Brewer H.B. Jr., Salen G. f Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8 , respectively."; 

RL Am. J. Hum. Genet. 69:278-2 90(2 001). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

cc _i_ TISSUE SPECIFICITY: Strongly expressed in the liver, lower levels 

CC in the small intestine and colon. 

CC -!- DISEASE: Defects in ABCG5 are a cause of sitosterolemia 

CC [MIM: 210250] ; also known as phytosterolemia or shellfish 

CC sterolemia. It is a rare autosomal recessive disorder 

CC characterized by increased intestinal absorption of all sterols 

CC including cholesterol, plant and shellfish sterols, and decreased 

CC biliary excretion of dietary sterols into bile. Sitosterolemia 

CC patients have hypercholesterolemia, very high levels of plant 

CC sterols in the plasma, and frequently develop tendon and tuberous 

CC xanthomas, accelerated atherosclerosis and premature coronary 

CC artery disease. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily . 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 



CC 

DR EMBL; AF320293; AAG40003.1; 

DR EMBL; AF312715; AAG53099.1; 

DR Genew; HGNC: 13886; ABCG5 . 

DR MIM; 605459; -. 

DR MIM; 210250; 



DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 
KW 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
SQ 



GO; GO: 0030299; P : cholesterol absorption; NAS. 

InterPro; IPR003593; AAA_ATPase. 

InterPro; IPR003439; ABC_transporter . 

Pfam; PF00005; ABC_tran; 1. 

ProDom; PD000006; ABC_transporter ; 1. 

SMART; SM00382; AAA; 1. 

PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; FALSE_NEG. 
PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 



ATP-binding; Glycoprotein; 
Disease mutation. 



DOMAIN 

TRANSMEM 

DOMAIN 

TRANSMEM 

DOMAIN 

TRANSMEM 

DOMAIN 

TRANSMEM 

DOMAIN 

TRANSMEM 

DOMAIN 

TRANSMEM 

DOMAIN 

NP_BIND 

CARBOHYD 

CARBOHYD 

VARIANT 

VARIANT 

VARIANT 

VARIANT 

VARIANT 

VARIANT 



1 

384 
405 
422 
443 
463 
484 
504 
525 
529 
550 
624 
645 
86 
584 
591 
146 

389 

419 

419 

550 

604 



383 
404 
421 
442 
462 
483 
503 
524 
528 
549 
623 
644 
651 
93 
584 
591 
146 

389 

419 

419 

550 

604 



SEQUENCE 651 AA; 72503 MW; 



Transmembrane; Transport; Polymorphism; 



CYTOPLASMIC (POTENTIAL) . 

1 (POTENTIAL) . 
EXTRACELLULAR (POTENTIAL) . 

2 ( POTENTIAL) . 
CYTOPLASMIC (POTENTIAL) . 

3 ( POTENTIAL) . 
EXTRACELLULAR (POTENTIAL) . 

4 (POTENTIAL) . 
CYTOPLASMIC (POTENTIAL) . 

5 ( POTENTIAL) . 
EXTRACELLULAR (POTENTIAL) . 

6 (POTENTIAL) . 
CYTOPLASMIC (POTENTIAL) . 
ATP (POTENTIAL). 

N-LINKED (GLCNAC. . .) (POTENTIAL) 

N-LINKED (GLCNAC. . .) (POTENTIAL) 

E -> Q (in sitosterolemia) . 

/FTId=VAR_012244 . 

R -> H (in sitosterolemia) . 

/FTId=VAR_012245 . 

R -> H (in sitosterolemia) . 

/FTId-VAR_012246. 

R -> P (in sitosterolemia) . 

/FTId-VAR_012247 . 

R -> S (in sitosterolemia) . 

/FTId=VAR_012248 . 

Q "> E. 

/FTId=VAR_012249. 

950BABFCBB6A1536 CRC64; 



Query Match 19.9%; Score 697; DB 1; Length 651; 

Best Local Similarity 28.9%; Pred. No. 6.3e-44; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; 



Gaps 



16 



Qy 

Db 

Qy 

Db 

Qy 

Db 



16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

|| : Ml || | : :|::| : : I I : ||:: : : I 

8 TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR- PWWD-ITSCRQQW 64 

7 6 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

| :: : : I I II I : : I : I I I I I : : I I I : : I I I I I : : : : 

65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

||: : : I ::| I : II :|IMIII : I : : I : hll hll 

116 NGRALRREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI -RRGNPGS FQKKVEAVMAE 174 



Qy 



195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 



II || : I I : I : I I I I I I I I I I I I : I : : : I I M : I II I I : : I 
Db 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWL 234 

Qy 255 LSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRY 314 

I I I : I I : I : : : : I I I I I : : I : I I I : : : : I I : I I : : I I I I I I : 

Db 235 LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

Qy 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 

I I I I I I : I I I I : I : I : I : I : I : : .1 : : : : I : : : I 
Db 295 SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL 348 

Qy 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

I : : I II II : I : I I I I : : : 

Db 349 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 

Qy 422 AEACLMSMTIGFLYFG HGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSER 475 

: : | : : I I : I I I II: | : : | : : : I 

Db 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ D RVGL L YQ FVGAT P YT GMLNAVN L F P VLR 446 

Qy 476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWIiANLRPGLQPFLLHFLLV 535 

I : I : I I I I I I I I : I : II 11=1 

Db 447 AVS DQE S QDGL YQKWQMMLAYALHVL P FS WATMI FS S VC YWTLGLH P EVARF — 499 

Qy 536 WLWFCCRIMALAAAALLPTFHMASFFS NALYNSFYLAG GFM 577 

: I I I I : I : I : : : I I I I : 

Db 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 54 9 

Qy 578 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 

I : : II : | : : | | | : : I : I : : : : 

Db 550 RNIQEMPI PFKI I S YFTFQKYCSEI LWNEFYGLNFTCGSSNVSVTTN 597 



RESULT 6 
ABG5_MOUSE 

ID ABG5_MOUSE STANDARD; PRT; 652 AA. 

AC Q99PE8; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL/6; TISSUE=Liver ; 

RX MEDLINE=20578753; PubMed=11138003 ; 

RA Lee M.-H., Lu K. , Hazard S-, Yu H., Shulenin S., Hidaka H., Kojima H., 

RA Allikmets R. , Sakuma N., Pegoraro R. , Srivastava A.K., Salen G., 

RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [2] 

RP TISSUE SPECIFICITY, AND INDUCTION. 



RX MEDLINE=20553648; PubMed=110994 17 ; 

RA Berge K.E., Tian H., Graf G.A., Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P . , Shan B-, Barnes R., Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters . 11 ; 

RL Science 290:1771-1775(2000). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Expressed in the intestine and, at lower 
CC level, in the liver. 

CC -!- INDUCTION: Upregulated by cholesterol feeding. Possibly mediated 
CC by the liver X receptor/retinoic X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF312713; AAG53097.1; -. 

DR MGD; MGI: 1351659; Abcg5. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC__tran; 1. 

DR ProDom; PD000006; ABC__transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_TRANSP0RTER_2 ; 1. 

KW ATP-binding; Glycoprotein; Transmembrane; Transport. 



FT 


DOMAIN 


1 


385 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


386 


406 


1 (POTENTIAL) . 


FT 


DOMAIN 


407 


422 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


423 


443 


2 ( POTENTIAL) . 


FT 


DOMAIN 


444 


463 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


464 


484 


3 (POTENTIAL) . 


FT 


DOMAIN 


485 


504 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


505 


525 


4 (POTENTIAL) . 


FT 


DOMAIN 


526 


529 


CYTOPLASMIC ( POTENTIAL) . 


FT 


TRANSMEM 


530 


550 


5 (POTENTIAL) . 


FT 


DOMAIN 


551 


622 


EXTRACELLULAR (POTENTIAL). 


FT 


TRANSMEM 


623 


643 


6 (POTENTIAL) . 


FT 


DOMAIN 


644 


652 


CYTOPLASMIC (POTENTIAL) . 


FT 


NP_BIND 


87 


94 


ATP (POTENTIAL) . 


FT 


CARBOHYD 


410 


410 


N-LINKED ( GLCNAC . . .) (POTENTIAL) 


FT 


CARBOHYD 


585 


585 


N-LINKED (GLCNAC. . .) (POTENTIAL) 


FT 


CARBOHYD 


592 


592 


N-LINKED (GLCNAC. . .) (POTENTIAL) 


SQ 


SEQUENCE 


652 AA; 


73244 


MW; 80CE37ADCC19771E CRC64; 



Query Match 19.7%; Score 691.5; DB 1; Length 652; 

Best Local Similarity 28.6%; Pred. No. 1.6e-43; 

Matches 188; Conservative 129; Mismatches 241; Indels 99; Gaps 16; 

Qy 45 NTLEVRDLNYQVDLASQV- PWFEQLAQFKMPWTSPSCQNSCELGI-QNLSFKVRSGQMLA 102 

: : I I : : I I : : : I I I I III : I : : : I : I I I : : 

Db 37 HSLGVLHVSYSV — SNRVGPW WNIKSCQQKWDRQILKDVSLYIESGQIMC 84 

Qy 103 IIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLT 162 

I : I II I I : : I I I I : I I I : : : : I I : | : : | | : I : I I 

Db 85 ILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCELRRDQFQDCFSY^/LQSDVFLSSLT 144 

Qy 163 VRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVS 222 

I I I I I : I : I I : I : I : I I I : I I I II : I : I : I I I I I I I I 

Db 145 VRETLRYTAMLALCRS-SADFYNKPCVTAVTytTELSLSHVADQMIGSYN 203 

Qy 223 IGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDL 282 

I I I I : I : : : I I I I I : I I I I I : : I I : I I : : I : I : : : : I I I I I : : I : I I 
Db 204 IAAQLLQDPKVT^LDEPTTGLDCMTANQIVLLLAELARRDRIVIVTIHQPRSELFQHFDK 263 

Qy 283 VLLMT S GT P I YLGAAQHMVQ YFTAI GYPCPRYSNPADFYVDLT S I DRRS REQELATREKA 342 

: : : I I : : I : I : : I I I I I I : I I I I I I : I I I I : I : I I I : I : I : : 
Db 264 IAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRV 323 

Qy 343 QSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTT 401 

|| || I I : : : : : : : I : I I I I I : 

Db 324 QMLECAFKE SDIYHKI-LENIERARYLKTLPT VPFKTKDPPGMFGKLGV 371 

Qy 402 LIRRQISNDFRDLPTLLIHGAEACLMSMTIGF — LYFGHGSIQLSFMDTAALLFMIGALI 459 

|:|| I I : : : : : : I : : I I : : : : : I I I : : 
Db 372 LL RRVT RNLMRN KQAVI MRLVQN L IMGL FLI F YL L RVQNNT LKGAVQ DRVGL L YQ LVGAT 431 

Qy 460 PFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLA 519 

I : : | : : : I I : I : I I I I I : I II : I : II 

Db 432 PYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIATVIFSSVCYWTL 491 

Qy 52 0 NLRPGLQPFLLHFLLWLWFCCRIMALAAAALLPTFHMASFFSNAL 566 

I I : I : I I I I : I : I 

Db 492 GLYPEVARF GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSI 534 

Qy 567 YNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT 621 

: : | | : | : : : : I : : I I I : : I I : II 
Db 535 VALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF YGL NFT 587 

Qy 622 IAVSGDKILSAMELDSYPLYAI YLIVIGLSGGFMVL 657 

I : I : : I : I I : I I : I : : I 

Db 588 CGGSNTSML NHPMCAITQGVQFI EKTCPGATSRFTANFLI LYGFI PALVI L 638 



RESULT 7 
WHIT_DROME 

ID WHIT_DROME STANDARD; PRT; 687 AA. 

AC P10090; Q9V3A2; Q9XY33; 

DT 01-MAR-1989 (Rel. 10, Created) 

DT 01-NOV-1991 (Rel. 20, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 



DE White protein. 

GN W OR EG: BACN33B1 . 1 OR CG2759. 

OS Drosophila melanogas ter (Fruit fly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; 

OC Ephydroidea; Drosophilidae; Drosophila. 

OX NCBI_TaxID=7227; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Head; 

RX MEDLINE^ 90221897; PubMed=2 109311; 

RA Pepling M. , Mount S.M.; 

RT "Sequence of a cDNA from the Drosophila melanogaster white gene."; 

RL Nucleic Acids Res. 18:1633-1633(1990). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDL1NE=85134865; PubMed=6084717 ; 

RA O'Hare K., Murphy C, Levis R., Rubin G.M. ; 

RT "DNA sequence of the white locus of Drosophila melanogaster."; 

RL J. Mol. Biol. 180:437-455(1984). 

RN [3] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=21100348; PubMed=11156992 ; 

RA Lukacsovich T . , Asztalos Z., Awano W., Baba K. , Kondo S., Niwa S., 

RA Yamamoto D. ; 

RT "Dual-tagging gene trap of novel genes in Drosophila melanogaster."; 

RL Genetics 157:727-742(2001). 

RN [4] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Berkeley; 

RX MEDLINE=20196006; PubMed=10731132 ; 

RA Adams M.D., Celniker S.E., Holt R.A. , Evans C.A., Gocayne J.D., 

RA Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A., Galle R.F., 

RA George R.A. r Lewis S.E., Richards S., Ashburner M. , Henderson S.N., 

RA Sutton G.G., Wortman J.R., Yandell M.D., Zhang Q. , Chen L.X., 

RA Brandon R.C., Rogers Y.-H.C, Blazej R.G., Champe M. , Pfeiffer B.D., 

RA Wan K.H., Doyle C. , Baxter E.G., Helt G., Nelson C.R., Miklos G.L.G., 

RA Abril J.F., Agbayani A., An H.-J., Andrews-Pf annkoch C, Baldwin D. f 

RA Ballew R.M., Basu A., Baxendale J., Bayraktaroglu L., Beasley E.M., 

RA Beeson K.Y., Benos P.V., Berman B.P., Bhandari D., Bolshakov S., 

RA Borkova D., Botchan M.R., Bouck J. , Brokstein P., Brottier P., 

RA Burtis K.C., Busam D.A., Butler H., Cadieu E., Center A. , Chandra I., 

RA Cherry J.M., Cawley S., Dahlke C, Davenport L.B., Davies P., 

RA de Pablos B. , Delcher A., Deng Z., Mays A.D., Dew I., Dietz S.M., 

RA Dodson K. , Doup L.E., Downes M., Dugan-Rocha S., Dunkov B.C., Dunn P., 

RA Durbin K.J., Evangelista C.C., Ferraz C, Ferriera S., Fleischmann W., 

RA Fosler C, Gabrielian A. E . f Garg N.S., Gelbart W.M., Glasser K., 

RA Glodek A., Gong F. f Gorrell J.H., Gu Z . , Guan P., Harris M. , 

RA Harris N.L., Harvey D.A., Heiman T.J., Hernandez J.R., Houck J., 

RA Hostin D., Houston K.A., Howland T.J., Wei M.-H., Ibegwam C. , 

RA Jalali M. , Kalush F. , Karpen G.H., Ke Z., Kennison J. A. , Ketchum K.A. , 

RA Kiramel B.E., Kodira CD., Kraft C, Kravitz S., Kulp D., Lai Z., 

RA Lasko P., Lei Y., Levitsky A. A. , Li J.H., Li Z., Liang Y. , Lin X., 

RA Liu X., Mattei B., Mcintosh T.C., McLeod M.P., McPherson D., 

RA Merkulov G., Milshina N.V., Mobarry C, Morris J. , Moshrefi A., 

RA Mount S.M., Moy M. , Murphy B., Murphy L., Muzny D.M., Nelson D.L., 

RA Nelson D.R., Nelson K.A., Nixon K., Nusskern D.R., Pacleb J.M., 



RA Palazzolo M. , Pittman G.S., Pan S., Pollard J. , Puri V., Reese M.G., 

RA Reinert K., Remington K. , Saunders R.D.C., Scheeler F. , Shen H., 

RA Shue B.C., Siden-Kiamos I., Simpson M. , Skupski M.P., Smith T., 

RA Spier E . , Spradling A.C., Stapleton M. , Strong R. , Sun E. , 

RA Svirskas R. , Tector C, Turner R. , Venter E . , Wang A.H., Wang X., 

RA Wang Z.-Y., Wassarman D.A., Weinstock G.M., Weissenbach J. r 

RA Williams S.M., Woodage T . , Worley K.C., Wu D., Yang S., Yao Q.A., 

RA Ye J., Yeh R.-F., Zaveri J.S., Zhan M. , Zhang G. , Zhao Q. , Zheng L., 

RA Zheng X.H., Zhong F.N., Zhong W. , Zhou X., Zhu S., Zhu X., Smith H.O., 

RA Gibbs R.A., Myers E.W., Rubin G.M., Venter J.C.; 

RT "The genome sequence of Drosophila melanogaster . " ; 

RL Science 287:2185-2195(2000). 

RN [5] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Oregon-R; 

RX MEDLINE=20196011; PubMed=10731137 ; 

RA Benos P.V. , Gatt M.K., Ashburner M. , Murphy L., Harris D. , 

RA Barrell B.G., Ferraz C, Vidal S., Brun C, Demailles J. , Cadieu E . , 

RA Dreano S., Gloux S., Lelaure V. , Mottier S., Galibert F. , Borkova D., 

RA Minana B., Kafatos F.C., Louis C. f Siden-Kiamos I., Bolshakov S., 

RA Papagiannakis G., Spanos L., Cox S., Madueno E., de Pablos B. f 

RA Modolell J., Peter A., Schoettler P., Werner M. , Mourkioti F. f 

RA Beinert N . , Dowe G. , Schaefer U., Jaeckle H., Bucheton A. , 

RA Callister D.M., Campbell L.A. , Darlamitsou A., Henderson N.S., 

RA McMillan P . J. , Salles C. , Tait E.A. , Valenti P., Saunders R.D.C., 

RA Glover D.M. ; 

RT "From sequence to chromosome: the tip of the X chromosome of D. 

RT melanogaster."; 

RL Science 287:2220-2222(2000). 

RN [6] 

RP SEQUENCE OF 224-331 FROM N.A. 

RX MEDLINE=89339145; PubMed=2503416; 

RA Tearle R.G., Belote J.M. , McKeown M. , Baker B.S., Howells A.J.; 

RT "Cloning and characterization of the scarlet gene of Drosophila 

RT melanogaster."; 

RL Genetics 122:595-606(1989). 

CC -!- FUNCTION: Part of a membrane-spanning permease system necessary 
CC for the transport of pigment precursors into pigment cells 

CC responsible for eye color. White dimerize with brown for the 

CC transport of guanine and with scarlet for the transport of 

CC tryptophan. 

CC -!- SUBUNIT: Heterodimer of white with either brown or scarlet. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; X51749; CAA36038.1; 

DR EMBL; X02974; CAA26716.1; 

DR EMBL; AB028139; BAA78210.1; -. 

DR EMBL; AE003425; AAF45826.1; -. 



DR EMBL; AL133506; CAB65847.1; 

DR EMBL; X76202; CAA53795.1; 

DR PIR; S08635; FYFFW. 

DR FlyBase; FBgn0003996; w. 

DR GO; GO: 0004 888; F : transmembrane receptor activity; NAS . 

DR GO; GO: 0006727; P:ommochrome biosynthesis; IMP. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART ; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 

FT NP_BIND 130 137 ATP (BY SIMILARITY) . 

FT TRANSMEM 435 4 53 POTENTIAL. 

FT TRANSMEM 465 4 85 POTENTIAL. 

FT TRANSMEM 515 533 POTENTIAL. 

FT TRANSMEM 542 563 POTENTIAL. 

FT TRANSMEM 57 6 594 POTENTIAL. 

FT TRANSMEM 659 67 8 POTENTIAL. 

FT CONFLICT 25 29 GDSGA -> LIFEIPYHCRVTAD (IN REF. 2 AND 

FT 3) . 

FT CONFLICT 49 49 L -> R (IN REF. 4 AND 5) . 

FT CONFLICT 335 371 VGAQCPTNYNPADFYVQVLAWPGREIESRDRIAKIC -> 

FT ITLHLNSYPAWVPSVLPTTIRRTFTYRCWPLCPDGRSSPVI 
FT GSPRYG (IN REF. 3) . 

SQ SEQUENCE 687 AA; 75672 MW; 24AFAD799DE0D396 CRC64; 

Query Match 18.7%; Score 656; DB 1; Length 687; 

Best Local Similarity 30.3%; Pred. No. 7.3e-41; 

Matches 178; Conservative 113; Mismatches 265; Indels 32; Gaps 10 

Qy 88 IQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGG — KIKSGQIWINGQPSSPQLVR 145 

: : I : I : : I I : : I I I I I : : II : : I I II : I I I I : : : 

Db 113 LKNVC GVAY P G EL LAVMG S S GAGKT T L LNALAFRS PQ G I QVS P S GMRL LN GQ P VDAKEMQ 172 

Qy 14 6 KCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRV 205 

I : I : I : : : I I I I I I I : I : II : II I I : I I I I I : I I : 

Db 173 ARCAYVQQDDL FI GSLTAREHLI FQAMVRMPRHLT YRQRVARVDQVI QELSLS KCQHT 1 1 232 

Qy 206 G-NMYVRGLSGGERRRVSI GVQLLWNPGI LI LDEPTSGLDS FTAHNLVKTLSRLAKGNRL 264 

I I : II I I I I I : I : : : I : I : I I I I I I I I I I I I I I I : : I : I : I : : : 

Db 233 GVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFTAHSWQVLKKLSQKGKT 292 

Qy 265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPTUDFYVDL 324 

I : : : : I I I I : : I I I I : I I I I : I I I : I : : I I I I I I I II I : 

Db 2 93 VILTIHQPSSELFELFDKILLMAEGRVAFLGTPSEAVDFFSYVGAQCPTNYNPADFYVQV 352 

Qy 325 TSIDRRSREQELATREKAQSLAALF-LEKV-RDLDDFLWKAETKDLDEDTCVESSVTPLD 382 

: : : I : : I : : : I : I I I I : : I M : I : : II: 
Db 353 LAV VPGREIESRDRIAKICDNFAISKVARDMEQLL ATKNLEK PLE 397 

Qy 383 TNCLPSP TKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGH 438 

| | I I :: I : :: : : : :::: M :: I 



Db 



398 



Q P ENG YT YKATW FMQ FRAVLWRS WL S VLKE P LLVKVRL I QTTMVAI L I GL I FLGQ 452 



Qy 439 GSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKIL 498 

I : I : : I : : I : I : II: I II III: 

Db 453 QLTQVGVMNINGAIFLFLTNMTFQNVFATINVFTSELPVFMREARSRLYRCDTYFLGKTI 512 

Qy 499 GELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLWLWFCCRIMALAAA 558 

III :: : I : I I I : I I I I I ■ ' I 

Db 513 AELPLFLTVPLVFTAIAYPMIGLRAGVLHFFNCLALVTLVANVSTSFGYLISCASSSTSM 572 

Qy 559 AS FFSNAL YNS FYLAGGFMINLS S LWTVPAWI S KVS FLRWCFEGLMKI QFS RRTYKM 615 

I : I I I I I : I I : I : I : I : I : III: I : : 

Db 573 ALSVGPPVIIPFLLFGGFFLNSGSVPVYLKWLSYLSWFRYANEGLLINQWADVEPGEISC 632 

Qy 616 PLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYVSLR 663 

II II II : : I I : : I I I I I :: I I 

Db 633 TSSNTTCPSSGKVILETLNFSAADLPLDYVGLAILIVSFRVLAYIALR 68 0 



RESULT 8 




YOH5 


YEAST 




ID 


YOH5 YEAST STANDARD; PRT; 1294 AA. 




AC 


Q08234; Q08233; 




DT 


01-NOV-1997 (Rel. 35, Created) 




DT 


16-OCT-2001 (Rel. 40, Last sequence update) 




DT 


16-OCT-2001 (Rel. 40, Last annotation update) 




DE 


Probable ATP-dependent transporter YOL074C/YOL075C . 




GN 


YOL074C/YOL075C. 




OS 


Saccharomyces cerevisiae (Baker's yeast). 




oc 


Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes ; 




oc 


Saccharomycetales ; Saccharomycetaceae ; Saccharomyces. 




ox 


NCBI TaxID=4932; 




RN 


[1] 




RP 


SEQUENCE FROM N.A. 




RX 


MEDLINE=97321807; PubMed=9178509 ; 




RA 


Tzermia M. , Katsoulou C, Alexandraki D.; 




RT 


"Sequence analysis of a 33.2 kb segment from the left arm of yeast 




RT 


chromosome XV reveals eight known genes and ten new open reading 




RT 


frames including homologues of ABC transporters, inositol 




RT 


phosphatases and human expressed sequence tags."; 




RL 


Yeast 13:583-589 (1997) . 




cc 


-!- SUBCELLULAR LOCATION: Integral membrane protein (Potential). 




CC 
CC 

cc 


-!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 




This SWISS-PROT entry is copyright. It is produced through a collaboration 


cc 


between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 


cc 


the European Bioinf ormatics Institute. There are no restrictions on 


its 


cc 


use by non-profit institutions as long as its content is in no 


way 


cc 


modified and this statement is not removed. Usage by and for commercial 


cc 


entities requires a license agreement (See http://www.isb-sib.ch/announce/ 


cc 
cc 

DR 


or send an email to license@isb-sib.ch). 




EMBL; Z74817; CAA99085.1; -. 




DR 


EMBL; Z74816; CAA99084.1; -. 




DR 


PIR; S77690; S77690. 




DR 


GermOnline; 143497; -. 




DR 


SGD; S0005435; YOL075C. 





DR InterPro; IPR003593; AAA__AT Pa s e . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC__tran; 2. 

DR ProDom; PD000006; ABC_transporter ; 2. 

DR SMART; SM00382; AAA; 2. 

DR PROSITE; PS00211; ABC_TRANSP0RTER_1 ; 2. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 2. 

KW Hypothetical protein; ATP-binding; Transmembrane; Glycoprotein; 

KW Transport; Repeat. 



FT 


TRANSMEM 


376 


396 


POTENTIAL. 




FT 


TRANSMEM 


496 


516 


POTENTIAL. 




FT 


TRANSMEM 


531 


551 


POTENTIAL. 




FT 


TRANSMEM 


605 


625 


POTENTIAL. 




FT 


TRANSMEM 


1039 


1059 


POTENTIAL. 




FT 


TRANSMEM 


1121 


1141 


POTENTIAL. 




FT 


TRANSMEM 


1267 


1287 


POTENTIAL . 




FT 


NP_BIND 


62 


69 


ATP (POTENTIAL) . 




FT 


NP_BIND 


727 


734 


ATP (POTENTIAL) . 




FT 


CARBOHYD 


41 


41 


N-LINKED (GLCNAC. 


. .) (POTENTIAL). 


FT 


CARBOHYD 


86 


86 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


101 


101 


N-LINKED (GLCNAC. 


. .) (POTENTIAL). 


FT 


CARBOHYD 


151 


151 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


341 


341 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


349 


349 


N-LINKED (GLCNAC. 


. .) (POTENTIAL). 


FT 


CARBOHYD 


371 


371 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


528 


528 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


983 


983 


N-LINKED (GLCNAC. 


. .) (POTENTIAL). 


FT 


CARBOHYD 


1062 


1062 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


SQ 


SEQUENCE 


1294 


AA; 145157 


MW; C555500A45E92* 


HE CRC64 ; 


Query Match 




18. 6%; 


Score 653; DB 1; 


Length 1294; 



Best Local Similarity 30.1%; Pred. No. 2.7e-40; 

Matches 171; Conservative 111; Mismatches 239; Indels 48; Gaps 13; 

Qy 88 I QNLS FKVRS GQMLAI I GS SGCGRASLLDVI TGRGHGGKI KS GQI 132 

: I : M : : I : : I I I I : : I I : I : : II : I I 
Db 45 VNTFSMDLPSGSVMAVMGGSGSGKTTLLNVLASKISGGLTHNGSIRYVLEDTGSEPNETE 104 

Qy 133 WINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKR- 187 

: : I I I : I : : I : I I I I I I I I I I : : I : : : I I : 

Db 105 PKRAHLDGQ-DHPIQKHVIMAYLPQQDVLSPRLTCRETLKFAADLKL NSSERTKKL 159 

Qy ' 188 -VEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSF 246 

II : I II I : I I I I II: I I I I I I I : II : I II I : : II I : II I I I : I I I : : 
Db 160 MVEQLIEELGLKDCADTLVGDNSHRGLSGGEKRRLSIGTQMISNPSIMFLDEPTTGLDAY 219 

Qy 247 TAHNLVKTLSRLAK-GNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFT 305 

: I : : I I I : I I I I : : I : I I I I II I I I I : : : I : I : : I I 

Db 220 SAFLVI KTLKKLAKEDGRTFIMSIHQPRSDILFLLDQVCILSKGNVVYCDKMDNTIPYFE 279 

Qy 306 AIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFLWKAET 365 

: I I I I : I I I I : : : I I : I : I I I : : I I I : : II : : I : : 

Db 280 SIGYHVPQLVNPADYFIDLSSVDSRSDKEEAATQSRLNSL IDHWHD YERTH 330 

Qy 366 KDLDEDTCVESS VTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEAC 425 

I : : : | : I : : I : : I : I I I I I : I I I I : II 

Db 331 LQLQAESYI-SNATEIQIQNM — TTRLP-FWKQVTVLTRRNFKLNFSDYVTLISTFAEPL 386 



Qy 426 LMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIP — FNVI LDVI S KCYS ERAMLYYELE 483 

: : | :: | : : : I :: : : I I : I : I 

Db 387 1 1 GTVCGWI YYKPDKS S I GGLRTTTACLYASTI LQCYLYLLFDTYRLCEQDI ALYDRERA 44 6 

Qy 484 DGLYTTGPYFFA-KILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLWLVVFCC 542 

: I I : I I I I : I : I : I I : : I : I I : I : I I 

Db 447 EGSVTPLAFIVARKISLFLSDDFAMTMIFVSITYFMFGLEADARKFFYQFAWFLCQLSC 506 

Qy 543 RIMALAAAALLPTFHMAS FFSNALYNS FYLAGGFMINLS SLWTVPAWI SKVS FLRWCFEG 602 

: : : : I : III I : : I I : I : I I : : I : I 

Db 507 S GLSMLS VAVS RDFS KAS LVGNMT FTVLSMGCGFFWAKVMPVYVRWI KYI AFTWYS FGT 566 

Qy 603 LMKIQFSRR TYKMPLGNLTIAVSG 626 

II I : 111:11 
Db 567 LMSSTFTNSYCTTDNLDECLGNQILEVYG 595 



RESULT 9 
ABG2__HUMAN 

ID ABG2_HUMAN STANDARD; PRT; 655 AA. 

AC Q9UNQ0; 095374; Q9BY73; Q9NUS0; 

DT 16-OCT-2001 (Rel. 40, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 2 (Placenta-specific ATP- 

DE binding cassette transporter) (Breast cancer resistance protein) . 

GN ABCG2 OR ABCP OR BCRP OR BCRP1 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI__TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Placenta; 

RX MEDLINE=99065313; PubMed=9850061 ; 

RA Allikmets R. , Schriml L.M., Hutchinson A., Romano-Spica V., Dean M. ; 

RT "A human placenta-specific ATP-binding cassette gene (ABCP) on 

RT chromosome 4q22 that is involved in multidrug resistance."; 

RL Cancer Res. 58:5337-5339(1998). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Breast cancer; 

RX MEDLINE=99080071; PubMed=98 61027 ; 

RA Doyle L.A., Yang W., Abruzzo L.V., Krogmann T., Gao Y., Rishi A.K., 

RA Ross D.D.; 

RT "A multidrug resistance transporter from human MCF-7 breast cancer 

RT cells."; 

RL Proc. Natl. Acad. Sci. U.S.A. 95:15665-15670(1998). 

RN [3] 

RP ERRATUM. 

RA Doyle L.A., Yang W., Abruzzo L.V., Krogmann T., Gao Y. , Rishi A.K., 

RA Ross D.D.; 

RL Proc. Natl. Acad. Sci. U.S.A. 96:2569-2569(1999). 

RN [4] 

RP SEQUENCE FROM N.A. 

RA Kage K., Tsukahara S., Sugiyama T., Asada S., Ishikawa E., Tsuruo T . , 



RA Sugimoto Y. ; 

RT "Breast cancer resistance protein constitutes a 140-kDa complex as a 

RT homodimer . " ; 

RL Submitted (MAR-2001) to the EMBL/ GenBank/ DDB J databases. 

RN [5] 

RP SEQUENCE OF 198-655 FROM N.A. 

RC TISSUE=Placenta; 

RA Isogai T., Ota T., Hayashi K. , Sugiyama T., Otsuki T., Suzuki Y. , 

RA Nishikawa T . , Nagai K., Sugano S., Shiratori A. , Sudo H. , 

RA Wagatsuma M. , Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M. , 

RA Takahashi M. , Chiba Y., Ishida S., Murakawa K., Ono Y., Takiguchi S., 

RA Watanabe S., Kimura K., Murakami K. , Ishii S., Kawai Y., Saito K., 

RA Yamamoto J., Wakamatsu A., Nakamura Y., Nagahari K. , Masuho Y., 

RA Ninomiya K., Iwayanagi T. ; 

RT "NEDO human cDNA sequencing project."; 

RL Submitted (FEB-2000) to the EMBL/ GenBank/ DDB J databases. 

RN [6] 

RP REVIEW. 

RX MEDLINE-21474438; PubMed=11590207 ; 

RA Schmitz G. , Langmann T . , Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism, 11 ; 

RL J. Lipid Res. 42:1513-1520(2001). 

CC -!- FUNCTION: Xenobiotic transporter that appears to play a major role 

CC in the multidrug resistance phenotype of a specific MCF-7 breast 

CC cancer cell line. When overexpressed, the transfected cells become 

CC resistant to mitoxantrone, daunorubicin and doxorubicin, display 

CC diminished intracellular accumulation of daunorubicin, and 

CC manifest an ATP-dependent increase in the efflux of rhodamine 123. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 

CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF103796; AAD09188.1; 

DR EMBL; AF098951; AAC97367.1; 

DR EMBL; AB056867; BAB39212.1; -. 

DR EMBL; AK002040; BAA92050.1; -. 

DR Genew; HGNC:74; ABCG2 . 

DR MIM; 603756; -. 

DR GO; GO: 0016021; C: integral to membrane; TAS . 

DR GO; GO: 0005524; F: ATP binding; TAS. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti . . .; TAS. 

DR GO; GO: 0005215; F: transporter activity; TAS. 

DR GO; GO:0008559; F: xenobiotic-transporting ATPase activity; TAS. 

DR GO; GO: 0009315; P:drug resistance; TAS. 

DR GO; GO: 0006810; P: transport; TAS. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 



DR 


SMART; SM00382; AAA; 1. 




DR 


PROSITE; 


PS00211; 


ABC TRANSPORTER^; FALSE_NEG. 


DR 


PROSITE; 


PS50893; 


ABC TRANS PORTER_2; 1. 


KW 


ATP-binding; Transmembrane 


; Transport. 


FT 


DOMAIN 


1 


395 


CYTOPLASMIC ( POTENTIAL) . 


FT 


TRANSMEM 


396 


416 


POTENTIAL. 


FT 


DOMAIN 


417 


428 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


429 


449 


POTENTIAL. 


FT 


DOMAIN 


450 


477 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


478 


498 


POTENTIAL. 


FT 


DOMAIN 


499 


506 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


507 


527 


POTENTIAL. 


FT 


DOMAIN 


528 


535 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


536 


556 


POTENTIAL. 


FT 


DOMAIN 


557 


630 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


631 


651 


POTENTIAL. 


FT 


DOMAIN 


652 


655 


CYTOPLASMIC (POTENTIAL) . 


FT 


NP_BIND 


80 


87 


ATP (POTENTIAL) . 


FT 


CARBOHYD 


418 


418 


N-LINKED (GLCNAC. . . ) (POTENTIAL) 


FT 


CARBOHYD 


557 


557 


N-LINKED (GLCNAC. . .) (POTENTIAL) 


FT 


CARBOHYD 


596 


596 


N-LINKED (GLCNAC. . .) (POTENTIAL) 


FT 


CONFLICT 


24 


24 


V -> A (IN REF. 2 AND 4) . 


FT 


CONFLICT 


166 


166 


E -> Q (IN REF. 2 AND 4) . 


FT 


CONFLICT 


208 


208 


F -> S (IN REF. 1) . 


FT 


CONFLICT 


315 


316 


MISSING (IN REF. 5) . 


FT 


CONFLICT 


482 


482 


R -> T (IN REF. 2) . 


SQ 


SEQUENCE 


655 AA; 


72343 


MW; 89A6D3511DC5CCE0 CRC64; 



Query Match 18.3%; Score 640.5; DB 1; Length 655; 

Best Local Similarity 27.9%; Pred. No. 9.7e-40; 

Matches 175; Conservative 131; Mismatches 254; Indels 67; Gaps 17; 

Qy 80 CQNSCELGI-QNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQP 138 

I : I I I : : : : I : I I : I : I I : : I I I I I : I : I I : I I I I 

Db 55 CRKPVEKEILSNINGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPSGL-SGDVLINGAP 112 

Qy 139 SSPQLVRKC-VAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRL 197 

II : I I : : : . I I I I I I I I : I I I : : : : : I : I I I I I 
Db 113 RPANF — KCNSGYWQDDVVMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIEELGL 170 

Qy 198 RQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSR 257 

: I I : : I I : : I I : I I I I I : I I I I : : I : : I II I I I I I : II II I I : : : I I 
Db 171 DKVADSKVGTQFIRGVSGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKR 230 

Qy 258 LAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNP 317 

: : I | : : I : I I I I I I : I II : I : I I : : I I I : I I : I I I I : I I 
Db 231 MS KQGRTI I FS I HQPRYS I FKL FDS LTLLAS GRLMFHGPAQEALGYFES AGYHCEAYNNP 290 

Qy 318 ADFYVDLTSIDRR SREQELATRE — KAQSLAALFLEKVRDL — DDFLWKAETK 366 

I I I : : I : : I : I I : : I : : I I : : : : : I ■ I I I 

Db 291 ADFFLDIINGDSTAV7^NREEDFKATEIIEPSKQDKPLIEKL7VEIYVNSSFYK-ETKAEL 349 

Qy 367 DLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

: : I : : : I I : : I I : : 

Db 350 HQLSGGEKKKKITVFKEISYTTSFC HQLRWVS KRS FKNLLGNPQAS I AQ 398 



Qy 



421 GAEACLMSMT I GFLYFGHGS I QLS FMDTAALLFMI GALI P FNVI LD VI SKC YS 



473 



: : : I I : I I I : : I : I I : : : I : I 

Db 399 IIVTWLGLVIGAIYFGLKNDSTGIQNRAGVLFFL TTNQCFS SVSAVEL 447 

Qy 474 ERAMLYYELEDGLYTTGPYFFAKILGE-LPEHCAYIIIYGMPTYWLANLRPGLQPFL 529 

I : : : I II II I : I : I I I I : I : : hi I 

Db 448 FWEKKLFIHEYISGYYRVSSYFLGKLLSDLLPMRMLPSIIFTCIVYFMLGLKPKADAFF 507 

Qy 530 LHFLLWLVVFCCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLW 589 

: : : I : I I I I I I : I : : : | : : | | : : : : : I 

Db 508 VMMFTLMMVAYSAS SMALAIAAGQSVVSVATLLMTI CFVFMMI FSGLLVNLTTIASWLSW 567 

Qy 590 ISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLT IAVSGDKI L — SAMELDS YPL 640 

: I I : I I : | : : | | | : I : : I : : I : I 

Db 568 LQYFSIPRYGFTALQHNEFLGQNF-CPGLNATGNNPCNYATCTGEEYLVKQGIDLSPWGL 626 

Qy 641 YAIYLIVIGLSGGFMVLYYVSLRFIKQ 667 

: : : : : | : : | : | | : I : 
Db 627 WKNHVALACMIVI FLTIAYLKLLFLKK 653 



RESULT 10 
WHIT_ANOGA 

ID WHIT_ANOGA STANDARD; PRT; 695 AA. 

AC Q27256; Q17006; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Anopheles gambiae (African malaria mosquito) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Nematocera; Culicoidea; Anopheles. 

OX NCBI_TaxI D=7 1 65 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Suakoko / G3; 

RX MEDLINE=96423158; PubMed=8825759; 

RA Besansky N.J., Bedell J. A., Benedict M.Q., Mukabayire O. , Hilfiker D., 

RA Collins F.H.; 

RT "Cloning and characterization of the white gene from Anopheles 

RT gambiae."; 

RL Insect Mol. Biol. 4:217-231(1995). 

CC -!- FUNCTION: May be part of a membrane-spanning permease system 

CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC — 

DR EMBL; U29486; AAC4 6995.1; -. 



DR EMBL; U29485; AAC46994.1; 

DR EMBL; U29484; AAC47423.1; 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR008965; Cellul_bind. 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 



FT 


NP_BIND 


133 


140 


ATP (POTENTIAL) . 




FT 


NP BIND 


288 


295 


ATP (POTENTIAL) . 




FT 


TRANSMEM 


444 


464 


POTENTIAL. 




FT 


TRANSMEM 


474 


494 


POTENTIAL. 




FT 


TRANSMEM 


524 


544 


POTENTIAL. 




FT 


TRANSMEM 


552 


572 


POTENTIAL. 




FT 


TRANSMEM 


581 


601 


POTENTIAL. 




FT 


TRANSMEM 


669 


689 


POTENTIAL. 




FT 


CARBOHYD 


472 


472 


N-LINKED ( GLCNAC . , 


, .) (POTENTIAL) 


FT 


CARBOHYD 


645 


645 


N-LINKED (GLCNAC. , 


. .) (POTENTIAL) 


FT 


CONFLICT 


100 


100 


N -> S (IN REF. 1; 


AAC47423) . 


FT 


CONFLICT 


691 


693 


SRS -> YAR (IN REF, 


. 1; AAC47423) . 


SQ 


SEQUENCE 


695 AA; 


77218 


MW; EE8B9517239B2 961 


CRC64; 



Query Match 17.9%; Score 627; DB 1; Length 695; 

Best Local Similarity 26.3%; Pred. No. le-38; 

Matches 189; Conservative 128; Mismatches 289; Indels 112; Gaps 17; 



Qy 


14 


Db 


10 


Qy 


67 


Db 


70 


Qy 


99 


Db 


127 


Qy 


157 


Db 


187 


Qy 


216 


Db 


247 


Qy 


276 


Db 


307 


Qy 


336 



I : I : I I I | : : | | : I I : I : I : I I 



-TSPSC QNSCELG IQNLSFKVRSG 98 

II : I I : : I : : : I I 



: : I I : : I I I I I : : I I : : I III : : I I I : : : I I : I : I 



1:1111111:1:1 : : I I : : I : I I I : I I I I : I 



I I I : I :: : I : I : I : I I I I I I II I I I I : : : : I : I :::::: I I I I 



: : | | | : | | : I : I I : : : I : : I I I I I I I I M I : : I 

LYCLFDKILLVAEGRVAFLGSPYQSAEFFSQLGIPCPPNYNPADFYVQMLAIAPAK 362 



390 



II : I : : I : I : I = 



Db 



363 



EAECRDMI KKI CDS FAVS P I AREVLETAS VAGKG 396 



Qy 391 -KMPGAVQ QFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGF 433 

| : I I I : :■ I : : I : : : : : : I I 

Db 397 MDEPYMLQQVEGVGSTGYRSSWWTQFYCILWRSWLSVLKDPMLVKVRLLQTAMVATLIGS 456 

Qy 434 LYFGHGS IQLS FMDTAALLFMI GALI PFNVT LD VT SKCYSERAMLYYELEDGLYTTGP YF 493 

: | | | I | : ||: : I : I I : : I : I II I I 

Db 457 IYFGQVLDQDGVMNINGSLFLFLTNMTFQNVFAVINVFSAELPVFLREKRSRLYRVDTYF 516 

Qy 494 FAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLWLWFCCRIMALAAAALL 553 

I : I I I I :: I I : I I I : I : I I I = 

Db 517 LGKTIAELPLFIAVPFVFTSITYPMIGLRTGATHYLTTLFIVTLVANVSTSFGYLISCAS 576 

Qy 554 PT FHMAS FFSNALYNS FYLAGGFMINLS S LWTVPAWI S KVS FLRW CFEGLMKIQFS- 609 

: || : I : I II : I : I III: : I : I I I I : hi 

Db 577 SSISMALSVGPPWIPFLIFGGFFLNSAS VPAYFKYLSYLSWFRYANEALLINQWST 633 

Qy 610 RRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLY 658 

| | : : :: II h I I I h I : h 

Db 634 WDGEIACTRANVTCPRSEIILETFNFRV-EDFALDIACLFA — LIVLFRLGALLCLW 688 



RESULT 11 
WHIT_CERCA 

ID WHIT_CERCA STANDARD; PRT; 679 AA. 

AC Q17320; 

DT Ol-NOV-1997 (Rel. 35, Created) 

DT Ol-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Ceratitis capitata (Mediterranean fruit fly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota ; Diptera; Brachycera; Muscomorpha; 

OC Tephritoidea; Tephritidae; Ceratitis. 

OX NCBI_TaxID=7213; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE-96123276; PubMed-8533095 ; 

RA Zwiebel L.J., Saccone G. , Zacharopoulou A., Besansky N.J., 

RA Favia G. , Collins F.H., Louis C, Kafatos F.C.; 

RT "The white gene of Ceratitis capitata: a phenotypic marker for 

RT germline transformation."; 

RL Science 270:2005-2007(1995). 

CC -!- FUNCTION: May be part of a membrane-spanning permease system 

CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 



CC or send an email to license@isb-sib. ch) . 

CC 

DR EMBL; X89933; CAA61998.1; 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pi gment_per mease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; AB C_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 



FT NP_BIND 


121 


128 




ATP (BY SIMILARITY) 




FT TRANSMEM 


427 


445 




POTENTIAL. 




FT TRANSMEM 


457 


477 




POTENTIAL. 




FT TRANSMEM 


507 


525 




POTENTIAL. 




FT TRANSMEM 


534 


555 




POTENTIAL. 




FT TRANSMEM 


568 


586 




POTENTIAL. 




FT TRANSMEM 


651 


670 




POTENTIAL. 




FT CARBOHYD 


628 


628 




N-LINKED (GLCNAC. . 


.) (POTENTIAL) 


FT CARBOHYD 


643 


643 




N-LINKED (GLCNAC. . 


.) (POTENTIAL) 


SQ SEQUENCE 


679 AA; 


75145 


MW; 


3F9CBC78A835C4CC 


CRC64; 


Query Match 




17. 8* 


h; 


Score 623.5; DB 1; 


Length 679; 


Best Local Similarity 


28.3%; 


Pred. No. 1.8e-38; 




Matches 169; 


Conservative 


112; Mismatches 264; 


Indels 53; 



Qy 88 IQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGG-KIKSGQI-WINGQPSSPQLVR 145 

: : I I | : : | | : : | | | | | : : | | : | I : I I : I I I : : : 

Db 104 LKNDSGVAYPGELLAVMGSSGAGKTTLLNASAFRSSKGVQISPSTIRMLNGHPVDAKEMQ 163 

Qy 146 KCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRV 205 



Db 




Qy 



206 G-NMYVRGLSGGERRRVS I GVQLLWNPGI LI LDEPT S GLDS FTAHNLVKTLS RLAKGNRL 264 



Db 



224 GVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFMAHSWQVLKKLSQKGKT 283 



QY 



265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPADFYVDL 324 



Db 



284 VILTIHQPSSELFELFDKILLMAEGRVAFLGTPGEAVDFFSYIGATCPTNYTPADFYVQV 343 



Qy 



Db 



325 TS IDRRSREQEL- ATREKAQSLAALFLEK VRDLDDFLWKAETKD 367 

: : : | | : : : | | | : | I : : : : I I 

344 LAWPGREVESRDRVAKICDNFAVGKVSREMEQNFQKLVKSNGFGKEDENEYTYKASW — 401 



Qy 



Db 



368 LDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLM 427 

| | :: | : :: : : : :: 
402 FMQFRAVLWRSWLSVLKEPLLVKVRLLQTTMV 433 



Qy 



428 SMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLY 487 



Db 



. . i i . . i i . i . . i . i i • • i • i ii 

434 AVLIGLIFLGQQLTQVGVMNINGAIFLFLTNMTFQNSFATITVFTTELPVFMRETRSRLY 493 



Qy 



488 TTGPYFFAKI LGELPEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLVWLWFCCRIMAL 547 



111:111 :: I I I I I I : I I I I I 

Db 4 94 RCDTYFLGKTIAELPLFLWPFLFTAIAYPLIGLRPGVDHFFTALALVTLVANVSTSFGY 553 

Qy 548 AAAALLPT FHMAS FFSNAL YNS FYLAGGFMINLS S LWTVPAWI S KVS FLRWC FEGLMKI Q 607 

: : II : I I I I I : I I : I : I : I : I : III: I 

Db 554 LISCACSSTSMALSVGPPVIIPFLLFGGFFLNSGSVPVYFKWLSYLSWFRYANEGLLINQ 613 

Qy 608 FS RRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAI YLIVIGLSGGFMVLYYVSL 662 

: : III I I : I I : : I : : : I I I : I : : I 

Db 614 WADVKPGEITCTLSNTTCPSSGEVILETLNFSASDLPFDFIGLALLIVGFRISAYIAL 671 



RESULT 12 
ABG1_M0USE 

ID ABG1_M0USE STANDARD; PRT; 666 AA. 

AC Q64343; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 1 (White protein homolog) 

DE (ATP-binding cassette transporter 8) . 

GN ABCGl OR ABC8 OR WHTl. 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=97186700; PubMed=9034316; 

RA Croop J.M., Tiller G.E., Fletcher J. A., Lux M.L., Raab E., 

RA Goldenson D., Son D., Arciniegas S., Wu R. ; 

RT "Isolation and characterization of a mammalian homolog of the 

RT Drosophila white gene."; 

RL Gene 18 5:77-85(1997). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=DBA/2; 

RX MEDLINE=96359154; PubMed=8703120 ; 

RA Savary S., Denizot F. , Luciani M.-F., Mattei M.-G., Chimini G. ; 

RT "Molecular cloning of a mammalian ABC transporter homologous to 

RT Drosophila white gene."; 

RL Manna. Genome 7:673-676(1996). 

RN [3] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=21092576; PubMed=11162488 ; 

RA Lorkowski S., Rust S., Engel T., Jung E., Tegelkamp K., Galinski E.A., 

RA Assmann G. , Cullen P.; 

RT "Genomic sequence and structure of the human ABCGl (ABC8 ) gene."; 

RL Biochem. Biophys . Res. Commun. 280:121-131(2001). 

RN [4] 

RP INDUCTION, AND PROBABLE FUNCTION. 

RX MEDLINE=20261604; PubMed=10799558 ; 

RA Venkateswaran A., Repa J.J., Lobaccaro J.-M.A., Bronson A., 

RA Mangelsdorf D.J., Edwards P. A. ; 

RT "Human white/murine ABC 8 mRNA levels are highly induced in 

RT lipid-loaded macrophages. A transcriptional role for specific 

RT oxysterols . "; 



RL J. Biol. Chem. 275:14700-14707(2000). 

RN [5] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed=l 15902 07 ; 

RA Schmitz G., Langmann T,, Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

CC -!- FUNCTION: Transporter involved in macrophage lipid homeostasis. Is 
CC an active component of the macrophage lipid export complex. Could 

CC also be involved in intracellular lipid transport processes. The 

CC role in cellular lipid homeostasis may not be limited to 

CC macrophages . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Expressed mainly in brain, thymus, lung, 
CC adrenals, spleen and placenta. Little or no expression in liver, 

CC kidney, heart, muscle or testes. 

CC -!- INDUCTION: Strongly induced in macrophage cell line RAW264.7 
CC during cholesterol influx. Induction is mediated by the liver X 

CC receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch) . 

CC 

DR EMBL; U34920; AAB47738.1; -. 

DR EMBL; Z48745; CAA88636.1; -. 

DR EMBL; AF323659; AAK27442.1; -. 

DR MGD; MGI: 107704; Abcgl. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC^transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; AB C_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC TRANSPORTER 2; 1. 



KW 


Transport; 


Lipid 


transport; 


ATP-binding; Transmembrane . 


FT 


DOMAIN 


1 


414 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


415 


433 


POTENTIAL . 


FT 


DOMAIN 


434 


444 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


445 


465 


POTENTIAL. 


FT 


DOMAIN 


466 


494 


CYTOPLASMIC ( POTENTIAL) . 


FT 


TRANSMEM 


495 


513 


POTENTIAL. 


FT 


DOMAIN 


514 


521 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


522 


543 


POTENTIAL. 


FT 


DOMAIN 


544 


555 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


556 


574 


POTENTIAL. 


FT 


DOMAIN 


575 


637 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


638 


657 


POTENTIAL. 


FT 


DOMAIN 


658 


666 


CYTOPLASMIC (POTENTIAL) . 



FT NP_BIND 118 125 ATP (POTENTIAL) . 

SQ SEQUENCE 666 AA; 74033 MW; EDDC6AFBD43950B6 CRC64; 

Query Match 17.7%; Score 621; DB 1; Length 666; 

Best Local Similarity 25.4%; Pred. No. 2.7e-38; 

Matches 165; Conservative 137; Mismatches 279; Indels 68; Gaps 16; 

Qy 33 DNSLYFT YSGQPN T LEVRDLN YQVDLAS QVPWFEQLAQFKMPWT S P S CQN S C 8 4 

||:|| : I I :| :ll:| I : II::: : 
Db 57 DNN — FTEAQRFSSLPRRAAVNIEFKDLSYSV PEGPWWKKKGYKTL 100 

Qy 85 ELGIQNLS FKVRS GQMLAI I GS S GCGRAS LLDVI TGRGHGGKI KSGQIWINGQP S S PQLV 144 

: : : I I I I : : : I I : I I I I : : : I : : : : I I I : I II I : 

Db 101 LKGI SGKFNS GELVAIMGP S GAGKSTLMNI LAG YRETG — MKGAVL I NGMP RDLRC F 155 

Qy 145 RKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTR 2 04 

|| : : I : I I I : I I I : I : I : : I | : | :: : : II I I : I I 

Db 156 RKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQE — KDEGRREMVKEILTALGLLPCANTR 213 

Qy 205 VGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRL 264 

| : | | | | : | : | : : | : : | : | | : : I I I I I I I I I : : I : I I : I I 

Db 214 TGS LSGGQRKRLAIALELVNNPPVMFFDEPTSGLDSASCFQWSLMKGLAQGGRS 268 

Qy 265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDL 324 

: : : : I I I : : I I I I : : : : I =11 -I I : I MINIM::: 
Db 2 69 IVCTIHQPSAKLFELFDQLYVLSQGQCVYRGKVSNLVPYLRDLGLNCPTYHNPADFVMEV 328 

Qy 325 TS I DRRSREQELATREKAQSLAALFLEKV RDLDDFLWKAETKDLDEDTCVESSVTPL 381 

| : : I : I : : I s s III : MM 

Db 329 ASGEYGDQNSRLVRAVREGMCDADYKRDLGGDTDVNPFLWH RPAEEDSASMEGCHSF 385 

Qy 382 DTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSI 441 

Ml I I I M : I I : : : I I II I I : 

Db 386 SASCL TQFCILFKRTFLSIMRDSVLTHLRITSHIGIGLLIGLLYLGIGNE 435 

Qy 442 QLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGEL 501 

: : | I : | : : : | : : I : I : I : I I : : : 

Db 4 36 7VKKVXSNSGFLFFSMLFLMFAALMPTVXTFPLEMSVFLREHLNYWYSLKAYYLAKTMADV 495 

Qy 502 PEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLVWLVVFCCRIMALAAAALLPTFHMASF 561 

| : : | ||:: I : I I : : : I I : MM 

Db 4 96 P FQ I MFP VAYC S I VYWMT S Q P S DAVRFVL FAALGTMT S LVAQ S LGL L I GAAS T S LQVAT F 555 

Qy 562 FSNALYNSFYLAGGFMINLSSLWTVPA WI S KVS FLRWC FEGLMKIQF — S RRT YKMP 616 

I ||:: I M I | :| : | : :| : I I I : : : I 

Db 556 VGPVTAIPVLLFSGFFVSFD T I PAYLQWMS YI S YVRYGFEGVI LS I YGLDREDLHCD 612 

Qy 617 LGNLTIAVS GDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYVSLRFI 665 

: : | | : : : : : I I : MM : : : : I I I I 

Db 613 IAETCHFQKSEAILRELDVENAKLY-LDFIVLG IFFISLRLI 653 



RESULT 13 
WHIT_LUCCU 

ID WHIT_LUCCU STANDARD; PRT; 677 AA. 

AC Q05360; 

DT 01-FEB-1995 (Rel. 31, Created) 



DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Lucilia cuprina (Greenbottle fly) (Australian sheep blowfly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Oestroidea; 

OC Calliphoridae; Lucilia. 

OX NCBI_TaxI D=7 375; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=97087158; PubMed=8 933176; 

RA Garcia R.L., Perkins H.D., Howells A. J.; 

RT "The structure, sequence and developmental pattern of expression of 

RT the white gene in the blowfly Lucilia cuprina."; 

RL Insect Mol. Biol. 5:251-260(1996). 

RN [2] 

RP SEQUENCE OF 490-584 FROM N.A. 

RX MEDLINE=90264941; PubMed=1971656; 

RA Elizur A., Vacek A.T., Howells A. J.; 

RT "Cloning and characterization of the white and topaz eye color genes 

RT from the sheep blowfly Lucilia cuprina."; 

RL J. Mol. Evol. 30:347-358(1990). 

CC -I- FUNCTION: May be part of a membrane- spanning permease system 
CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U38899; AAA82057.1; -. 

DR EMBL; X53265; CAA37365.1; -. 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC__transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a012 04; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 



FT 


NP BIND 


119 


126 


ATP (POTENTIAL) . 


FT 


TRANSMEM 


431 


451 


POTENTIAL. 


FT 


TRANSMEM 


456 


476 


POTENTIAL. 


FT 


TRANSMEM 


506 


526 


POTENTIAL. 


FT 


TRANSMEM 


534 


554 


POTENTIAL. 


FT 


TRANSMEM 


563 


583 


POTENTIAL. 


FT 


TRANSMEM 


647 


667 


POTENTIAL. 


SQ 


SEQUENCE 


677 AA; 


75365 


MW; D16FC11C97EED51D 



Query Match 17.7%; Score 620.5; DB 1; Length 677; 

Best Local Similarity 29.3%; Pred. No. 3e-38; 

Matches 174; Conservative 119; Mismatches 259; Indels 41; Gaps 13; 

Qy 88 IQNLS FKVRSGQMLAI IGS SGCGRASLLDVITGR- GHGGKI KSGQI -WINGQPS S PQLVR 145 

1:1: I : : I I : : I I I I I : : I I : : I I : I : MM : : : 

Db 102 IKNVCGVAYPGELLAVMGSSGAGKTTLLNALAFRSARGVQISPSSVRMLNGHPVDAKEMQ 161 

Qy 146 KCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRV 205 

I : I : I : : : I I I I I I I : I : I I I : I I : : I I : I I : I I : I : I : 
Db 162 ARCAYVQQDDLFIGSLTT^REHLIFQATVRMPRTMTQKQKLQRVDQVIQDLSLIKCQNTII 221 

Qy 206 G-NMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRL 264 

I I : I I I I I I I : I : : : I : I : I I I I I I I I I I I I I : : I : I : I : : : 

Db 222 GVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFMAASWQVLKKLSQRGKT 281 

Qy 265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDL 324 

I : :: : I II I :: I I I I : I I I I : II I : I : I I II I I I II I I : 

Db 282 VILTIHQPSSELFELFDKILLMAEGRVAFLGTPVEAVDFFSFIGAQCPTNYNPADFYVQV 341 

Qy 325 TSIDRRSREQELATREKAQSLAALF-LEKV-RDLDDFLWK — AETKDLDEDTCVESSVTP 380 

: : : I : : I : : : I : I I I : : : I hi I : I 
Db 342 LAV VPGREI ESRDRI SKI CDNFAVGKVSREMEQNFQKIAAKTDGLQKDD 390 

Qy 381 LDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGS 440 

: I I : I I :: I : :: : : : :::: I I :: 

Db 391 -ETTILYKASWF TQFRAIMWRSWISTLKEPLLVKVRLIQTTMVAVLIGLIFLNQPM 445 

Qy 441 IQLS FMDTAALLFMI GALI PFNVI LDVI SKCYSERAMLYYELEDGLYTTGPYFFAKI LGE 500 

I : I : : I : : I : I I : I I : I II I I I I I 

Db 446 TQVGVMNINGAIFLFLTNMTFQNVFAVINVFTSELPVFMRETRSRLYRCDTYFLGKTLAE 505 

Qy 501 LPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLWLWFCCRIM7VLAAAALLPTFHMAS 560 

II : : I : I I I I : I I I I I I : : I I 

Db 506 LPLFLWPFLFIAIAYPMIGLRPGITHFLSALALVTLVANVSTSFGYLISCASTSTSMAL 565 

Qy 561 FFSNALYNS FYLAGGFMINLS SLWTVPAWI SKVS FLRWCFEGLMKIQFSRRT YKMPLG — 618 

I I I I I : I I : I : I I : I : III: I :: : I 

Db 566 SVGPPLTIPFLLFGGVFLNSGSVPVYFKWLSYFSWFRYANEGLLINQWA DVQPGEI 621 

Qy 619 NLTIAVSGDKILSAMELD SYPLYAIYLIVIGLSGGFMVLYYVSLR 663 

II II I : :: M : I ::: | : I I : : 

Db 622 TCTSTNTTCPSSGXVXLETLNFRDKFTFRLYGLILLIL I FRI AGYVAXK 670 



RESULT 14 
ABG1_HUMAN 

ID ABGl_HUMAN STANDARD; PRT; 678 AA. 

AC P45844; Q9BXK6; Q9BXK7; Q9BXK8; Q9BXK9; Q9BXL0; Q9BXL1; Q9BXL2; 
AC Q9BXL3; Q9BXL4; 

DT 01-NOV-1995 (Rel. 32, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 1 (White protein homolog) 
DE (ATP-binding cassette transporter 8) . 
GN ABCG1 OR ABC8 OR WHT1 . 



OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE OF 3-678 FROM N.A. (ISOFORMS 1 AND 4). 

RC TISSUE=Retina; 

RX MEDLINE=96256850; PubMed==8 659545; 

RA Chen H.M., Rossier C, Lalioti M.D., Lynn A. f Chakravarti A., 

RA Perrin G., Antonarakis S.E.; 

RT "Cloning of the cDNA for a human homologue of the Drosophila white 

RT gene and mapping to chromosome 21q22.3."; 

RL Am. J. Hum. Genet. 59:66-75(1996). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE=20289799; PubMed~l 08 30953 ; 

RA Hattori M. , Fujiyama A., Taylor T.D., Watanabe H. , Yada 

RA Park H.-S., Toyoda A., Ishii K. , Totoki Y. f Choi D.-K., Groner Y., 

RA Soeda E . , Ohki M. , Takagi T., Sakaki Y., Taudien S., Blechschmidt K., 

RA Polley A. , Menzel U., Delabar J., Kumpf K. , Lehmann R., Patterson D., 

RA Reichwald K., Rump A., Schillhabel M., Schudy A. , Zimmermann W., 

RA Rosenthal A., Kudoh J., Shibuya K., Kawasaki K. , Asakawa S., 

RA Shintani A., Sasaki T., Nagamine K., Mitsuyama S., Antonarakis S.E., 

RA Minoshima S., Shimizu N., Nordsiek G., Hornischer K. , Brandt P . , 

RA Scharfe M. , Schoen O., Desario A., Reichelt J., Kauer G., Bloecker H., 

RA Ramser J., Beck A., Klages S., Hennig S., Riesselmann L., Dagand E., 

RA Wehrmeyer S., Borzym K., Gardiner K. , Nizetic D., Francis F. , 

RA Lehrach H., Reinhardt R., Yaspo M.-L.; 

RT "The DNA sequence of human chromosome 21."; 

RL Nature 4 05:311-319(2 000). 

RN [3] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE=20408883; PubMed=10950923 ; 

RA Berry A., Scott H.S., Kudoh J., Talior I., Korostishevsky M. , 

RA Wattenhofer M. , Guipponi M. , Barras C, Rossier C, Shibuya K., 

RA Wang J., Kawasaki K. f Asakawa S., Minoshima S., Shimizu N., 

RA Antonarakis S.E., Bonne-Tamir B.; 

RT "Refined localization of autosomal recessive nonsyndromic deafness 

RT DFNB10 locus using 34 novel microsatellite markers, genomic 

RT structure, and exclusion of six known genes in the region."; 

RL Genomics 68:22-29(2000). 

RN [4] 

RP SEQUENCE FROM N.A. (ISOFORM 1). 

RX MEDLINE=21192304; PubMed=11279031 ; 

RA Porsch-Oezcueruemez M. , Langmann T., Heimerl S., Borsukova H., 

RA Kaminski W.E., Drobnik W. , Honer C, Schumacher C, Schmitz G.; 

RT "The zinc finger protein 202 (ZNF202) is a transcriptional repressor 

RT of ATP binding cassette transporter Al (ABCA1 ) and ABCG1 gene 

RT expression and a modulator of cellular lipid efflux."; 

RL J. Biol. Chem. 276:12427-12433(2001). 

RN [5] 

RP SEQUENCE FROM N.A. (ISOFORMS 2; 3; 4; 5; 6 AND 7). 

RX MEDLINE=21092576; PubMed=111624 88 ; 

RA Lorkowski S., Rust S., Engel T., Jung E., Tegelkamp K. , Galinski E.A. , 

RA Assmann G., Cullen P.; 

RT "Genomic sequence and structure of the human ABCG1 (ABC8) gene."; 

RL Biochem. Biophys . Res. Commun . 280:121-131(2001). 



RN [6] 

RP SEQUENCE OF 33-678 FROM N.A. 

RC TISSUE=Fetal brain; 

RX MEDLINE=97186700; PubMed=9034316; 

RA Croop J.M., Tiller G.E., Fletcher J. A., Lux M.L., Raab E., 

RA Goldenson D., Arciniegas S., Son D., Wu R. ; 

RT "Isolation and characterization of a mammalian homolog of the 

RT Drosophila white gene."; 

RL Gene 185:77-85(1997) . 

RN [7] 

RP INDUCTION, AND PROBABLE FUNCTION. 

RX MEDLINE=20261604 ; PubMed=10799558 ; 

RA Venkateswaran A., Repa J. J., Lobaccaro J.-M.A. , Bronson A., 

RA Mangelsdorf D.J., Edwards P. A.; 

RT "Human white/murine ABC8 mRNA levels are highly induced in 

RT lipid-loaded macrophages. A transcriptional role for- specific 

RT oxysterols . " ; 

RL J. Biol. Chem. 275:14700-14707(2000). 

RN [8] 

RP INDUCTION, AND PROBABLE FUNCTION. 

RX MEDLINE=20105556; PubMed-10639163 ; 

RA Klucken J., Buechler C, Orso E. r Kaminski W.E., 

RA Porsch-Oezcueruemez M. , Liebisch G., Kapinsky M., Diederich W. , 

RA Drobnik W., Dean M. , Allikmets R. , Schmitz G. ; 

RT "ABCG1 (ABC8) , the human homolog of the Drosophila white gene, is a 

RT regulator of macrophage cholesterol and phospholipid transport."; 

RL Proc. Natl. Acad. Sci. U.S.A. 97:817-822(2000). 

RN [9] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed=11590207 ; 

RA Schmitz G. , Langmann T., Heimerl S.; 

RT "Role of ABCGl and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

CC -!- FUNCTION: Transporter involved in macrophage lipid homeostasis. Is 
CC an active component of the macrophage lipid export complex. Could 

CC also be involved in intracellular lipid transport processes. The 

CC role in cellular lipid homeostasis may not be limited to 

CC macrophages. 

CC -!- SUBUNIT: May form heterodimers with several heterologous partners 
CC of the ABCG subfamily. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. Predominantly 
CC localized in the intracellular compartments mainly associated with 

CC the endoplasmic reticulum (ER) and Golgi membranes. 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=7; 

CC Comment-Additional isoforms seem to exist; 

CC Name=l; 

CC IsoId=P45844-l; Sequence=Displayed; 

CC Name=2 ; Synonyms=J; 

CC IsoId-P45844-2; Sequence=VSP_000047 , VSP_000051; 

CC Name=3; Synonyms =ABDE; 

CC IsoId=P45844-3; Sequence=VSP__00004 8 , VSP_000051; 

CC Name=4; Synonyms=G; 

CC IsoId=P458 44-4; Sequence=VSP_000051 ; 

CC Name-5; Synonyms=F; 

CC IsoId=P45844-5; Sequence=VSP_00004 9 / VSP_000051; 

CC Name=6; Synonyms=HI; 



CC IsoId-P45844-6; Sequence=VSP_000046, VSP_000051; 

CC Name=7; Synonyms=C; 

CC IsoId=P45844~7; Sequence=VSP_000050, VSP_000051; 

CC -!- TISSUE SPECIFICITY: EXPRESSED IN SEVERAL TISSUES. 

CC -!- INDUCTION: Strongly induced in monocyte-derived macrophages during 
CC cholesterol influx. Conversely, mRNA and protein expression are 

CC suppressed by lipid efflux. Induction is mediated by the liver X 

CC receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; X91249; CAA62631.1; ALT_INIT. 

DR EMBL; AP001746; BAA95530.1; ALT__INIT . 

DR EMBL; AB038161; BAB13728.2; ALT_INIT. 

DR EMBL; AJ289137; CAC00730.1; ALT_INIT. 

DR EMBL; AJ289138; CAC00730.1; JOINED. 

DR EMBL; AJ289139; CAC00730.1; JOINED. 

DR EMBL; AJ289140; CAC00730.1; JOINED. 

DR EMBL; AJ289141; CAC00730.1; JOINED. 

DR EMBL; AJ289142; CAC00730.1; JOINED. 

DR EMBL; AJ289143; CAC00730.1; JOINED. 

DR EMBL; AJ289144; CAC00730.1; JOINED. 

DR EMBL; AJ289145; CAC00730.1; JOINED. 

DR EMBL; AJ289146; CAC00730.1; JOINED. 

DR EMBL; AJ289147; CAC00730.1; JOINED. 

DR EMBL; AJ289148; CAC00730.1; JOINED. 

DR EMBL; AJ289149; CAC00730.1; JOINED. 

DR EMBL; AJ289150; CAC00730.1; JOINED. 

DR EMBL; AJ289151; CAC00730.1; JOINED. 

DR EMBL; AF323658; AAK28836.1; -. 

DR EMBL; AF323644; AAK28836.1; JOINED. 

DR EMBL; AF323645; AAK28836.1; JOINED. 

DR EMBL; AF323646; AAK28836.1; JOINED. 

DR EMBL; AF323647; AAK28836.1; JOINED. 

DR EMBL; AF323648; AAK28836.1; JOINED. 

DR EMBL; AF323649; AAK28836.1; JOINED. 

DR EMBL; AF323650; AAK28 836.1; JOINED. 

DR EMBL; AF323651; AAK28836.1; JOINED. 

DR EMBL; AF323652; AAK28836.1; JOINED. 

DR EMBL; AF323653; AAK28836.1; JOINED. 

DR EMBL; AF323654; AAK28836.1; JOINED. 

DR EMBL; AF323655; AAK28836.1; JOINED. 

DR EMBL; AF323656; AAK28836.1; JOINED. 

DR EMBL; AF323657; AAK28836.1; JOINED. 

DR EMBL; AF323664; AAK28842.1; 

DR EMBL; AF323658; AAK28833.1; 

DR EMBL; AF323640; AAK28833.1; JOINED. 

DR EMBL; AF323645; AAK28833.1; JOINED. 

DR EMBL; AF323646; AAK28833.1; JOINED. 



DR 


EMBL, 


; AF323647 


; AAK28833 


1, 


? JOINED. 


DR 


EMBL, 


? AF323648 


? AAK28833 


1, 


; JOINED. 


DR 


EMBL, 


; AF323649, 


? AAK28833 


1, 


? JOINED. 


DR 


EMBL, 


? AF323650, 


? AAK28833 


1, 


; JOINED. 


DR 


EMBL, 


; AF323651, 


f AAK28833, 


1, 


; JOINED. 


DR 


EMBL, 


? AF323652, 


r AAK28833. 


1, 


? JOINED. 


DR 


EMBL, 


? AF323653, 


; AAK28833. 


1, 


; JOINED. 


DR 


EMBL, 


? AF323654, 


; AAK28833. 


1, 


; JOINED. 


DR 


EMBL, 


r AF323655, 


; AAK28833. 


1, 


; JOINED. 


DR 


EMBL, 


? AF323656, 


: AAK28833. 


1, 


r JOINED. 


DR 


EMBL, 


r AF323657, 


; AAK28833. 


1, 


? JOINED. 


DR 


EMBL, 


? AF323660, 


■ AAK28838. 


1, 


; - . 


DR 


EMBL, 


? AF323663, 


■ AAK28841. 


1, 


r ALT_INI 


DR 


EMBL, 


? AF323658, 


■ AAK28835. 


1, 


; - . 


DR 


EMBL, 


; AF323642, 


■ AAK28835. 


1, 


? JOINED. 


DR 


EMBL, 


? AF323645, 


• AAK28835. 


1, 


; JOINED. 


DR 


EMBL, 


■ AF323646, 


• AAK28835. 


1, 


; JOINED. 


DR 


EMBL, 


; AF323647, 


• AAK28835. 


1, 


; JOINED. 


DR 


EMBL, 


■ AF323648, 


■ AAK28835. 


1, 


; JOINED. 


DR 


EMBL, 


■ AF323649, 


■ AAK28835. 


1, 


? JOINED. 



Query Match 17.6%; 
Best Local Similarity 25.7%; 



Score 617; DB 1; Length 678; 
Pred. No. 5.5e-38; 



Matches 173; Conservative 130; Mismatches 266; Indels 104; Gaps 

Qy 33 DNSLYFT — YSGQPN TLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSPSCQNSCEL 

I I : I : I I : I I I I : I I : | | : : : 
Db 57 DNNLTEAQRFSSLPRRAAVNIEFRDLSYSV PEGPWWRKKGYKTL 

Qy 87 GIQNLS FKVRSGQMLAI I GS SGCGRASLLDVITGRGHGGKI KSGQIWINGQPS S PQLVRK 

: : : I I I I : : : I I : I I I I : : : I : : : : I I I : I I I I : II 

Db 101 - LKGI S GKFNSGEL VAIMGP S GAGKSTLMNI LAGYRETG — MKGAVLINGLPRDLRCFRK 



QY 
Db 

Qy 

Db 

QY 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



147 CVAHVRQHNQLLPNLTVRETLAFI AQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVG 
: : I : I I I : I I I : I : I : : I I : I : : : : II I I : I I I 

158 VSCYIMQDDMLLPHLTVQEAMMVSAHLKLQE — KDEGRREMVKEILTALGLLSCANTRTG 

207 NMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKGNRLVL 

: | | | | : | : | : : | : : | : M : : I I I I I I I I I : : I : I I : I I : : 

216 S L S GGQRKRLAI ALELVNN P P VMFFDE PT S GLD S AS C FQWS LMKGLAQGGRS 1 1 

2 67 ISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPADFYVDLTS 
: : I I I : : I I I I : : : : I : I I : : I I : I I I I I I I I I ::: I 
271 CTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPYLRDLGLNCPTYHNPADFVMEVAS 

327 IDRRSREQEL — ATREKAQSLAALFLEKVRDL DDFLWKAET KDLD 

: : I I I I : I I I : I I I : II 

331 GE YGDQNS RLVRAVRE GMCDSDHKRDLGGDAEVNPFLWHRPSEEVKQTKRLKGLR 

37 0 EDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSM 

: I : : I I I I I : I : I I : : : 

38 6 KDSSSMEGCHSFSASCL TQFCILFKRTFLSIMRDSVLTHLRITSHIGIGL 

430 TIGFLYFGHGSIQLSFMDTAALLF MI GALI P FNVI LDVI S KC YS ERAMLYYELE 

I I I I I I : : : I I I I I : I : I : I I 

436 LI GLLYLGIGNEAKKVLSNSGFLFFSMLFLMFAALMP TVLTFPLE 



18; 
86 
100 
146 
157 
206 
215 
266 
270 
326 
330 
369 
385 
429 
435 
483 
480 



Qy 484 DGL YTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLL 534 

I : I : I : I I : :: I : : I II:: hi I 

Db 4 81 MGVFLREHLNYWYSLKAYYLAKTMADVPFQIMFPVAYCSIVT^ 54 0 

Qy 535 VWLWFCCRIMALAAAALLPTFHMAS FFSNALYNS FYLAGGFMINLS SLWTVPAWI SKVS 594 

: : : I I : :|:| | | | :: :: I |:| :| 

Db 541 GTMTSLVAQSLGLLIG7UVSTSLQVATFVGPVTAIPVLLFSGFFVSFDTIPTYLQWMSYIS 600 

Qy 595 FLRWCFEGLMKIQF — SRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSG 652 

: : I : I I I : : : I : : I I : : : : : I I : I I : I 
Db 601 YVRYGFEGVILSIYGLDREDLHCDIDETCHFQKSEAILRELDVENAKLY-LDFIVLG 656 

Qy 653 GFMVLYYVS LRFI 665 

: : : : I I I I 
Db 657 IFFISLRLI 665 



RESULT 15 
YPC3_CAEEL 

ID YPC3_CAEEL STANDARD; PRT; 598 AA. 

AC Q11180; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Putative ABC transporter C05D10.3 in chromosome III. 

GN C05D10.3. 

OS Caenorhabditis elegans. 

OC Eukaryota; Metazoa; Nematoda; Chromadorea; Rhabditida; Rhabditoidea; 

OC Rhabditidae; Peloderinae; Caenorhabditis. 

OX NCBI_TaxID-6239; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Bristol N2; 

RA Du Z . ; 

RL Submitted (AUG-1994) to the EMBL/ GenBank/DDBJ databases. 

RN [2] 

RP REVISIONS. 

RA Waterston R. ; 

RL Submitted (SEP-2001) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Potential). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb~sib . ch) . 

CC 

DR EMBL; U13645; AAA20989.2; -. 

DR WormPep; C05D10.3; CE29170. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 



DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC__TRANSP0RTER_1 ; FALSE_NEG. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW Hypothetical protein; ATP-binding; Transmembrane; Transport. 



FT 


NP_BIND 


27 


34 


ATP (POTENTIAL) . 


FT 


TRANSMEM 


336 


356 


POTENTIAL. 


FT 


TRANSMEM 


425 


445 


POTENTIAL. 


FT 


TRANSMEM 


453 


473 


POTENTIAL. 


FT 


TRANSMEM 


478 


498 


POTENTIAL. 


SQ 


SEQUENCE 


598 AA; 


66906 


MW; 9D6414E068 98E343 



Query Match 17.1%; Score 600; DB 1; Length 598; 

Best Local Similarity 27.9%; Pred. No. 8.6e-37; 

Matches 170; Conservative 116; Mismatches 260; Indels 64; Gaps 15; 

Qy 88 I QNLS FKVRS GQMLAI I GS S GCGRAS LLDVITGRGHGGKI KS GQI WINGQPS S PQLVRKC 147 

: I : I I I : : I I I : I I M I : : I : : I : I I | | | : I : : : : I : 

Db 10 LHNVSGMAESGKLLAI LGS SGAGKTTLMNVLTSRNLTNLDVQGS I LI DGRRANKWKI REM 69 



Qy 14 8 VAHVRQHNQLLPNLTVRETLAFIAQMRL-PRTFSQAQRDKRVEDVIAELRLRQCADTRVG 206 

11:11: : :| II I |:|::|: : :| :| I I I I : : : I : : I I I I : I 
Db 70 SAFVQQHDMFVGTMTAREHLQFMARLRMGDQYYSDHERQLRVEQVLTQMGLKKC7U)TVIG 129 

Qy 207 -NMYVRGLSGGERRRVS I GVQLLWNPGI LI LDEPTSGLDS FTAHNLVKTLSRLAKGNRLV 265 

: : I I I I I : : I : I : : I III I I I I I I I I : I I : : I : I II I 
Db 130 IPNQLKGLSCGEKKRLSFASEILTCPKILFCDEPTSGLDAFMAGHWQ7VLRSLADNGMTV 189 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRYSNPADFYVDLT 325 

: I : : I I I I : : I I : I I I I I I I I I I I I I I I I I I I : 

Db 190 IITIHQPSSHVYSLFNNVCLMACGRVIYLGPGDQAVPLFEKCGYPCPAYYNPADHLIRTL 249 

Qy 32 6 SIDRRSREQELATREKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTC VES 37 6 

: : I : I : I : I II : I I : I 

Db 250 AVIDSDRATSMKT ISKIR— QGFL STDLGQSVLAIGNANKLRAAS 292 

Qy 377 SVTPLDTNCLPSPTKM PGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSM 429 

MM: || I I I I Ml:::: 

Db 293 FVTGSDTS EKTKTFFNQDYNAS FWTQFLALFWRSWLTVIRDPNLLSVRLLQILITAF 34 9 

Qy 430 TIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDV 1 SKCYSERAMLYYE 481 

I : : I | | : | : : | | | : : : : I : : I 

Db 350 ITGIVFF QTPVTPATIISINGIM-FNHIRNMNFMLQFPNVPVITAELPIVLRE 401 

Qy 482 LEDGLYTTGP YFFAKI LGELPEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLVWLWFC 541 

: I : I I I I I I : I I I : : Ml II : : I I ■ I : I : 

Db 402 NANGVYRTSAYFLAKNI AELPQYI I LPI LYNTIVYWMSGLYPNFWNYCFASL VTI LITNV 4 61 

Qy 542 CRIMALAAAALLPT FHMAS FFSNALYNS FYLAGGFMINLS S LWTVPAWI S KVS FLRWCFE 601 

:: I I : :| Ml I :: : |:| :|: :: :| 

Db 462 AISISYAVATIFANTDVAMTILPIFWPIMAFGGFFITFDAIPSYFKWLSSLSYFKYGYE 521 

Qy 602 GLM KIQFSRRTYKMPLGNLTI AVSGDKILSAMELD-SYPLYAI YLIVIGLSG 652 

| | : : : : : I : : I : : : I : : : I I : I : 

Db 522 ALAINEWDS I KVI PECFNSSMTAFALDSCPKNGHQVLESI DFSASHKI FDI -S ILFGMFI 580 



Qy 



653 GFMVLYYVSL 662 



I : : I I : I 
Db 581 GIRIIAYVAL 590 



Search completed: February 27, 2004, 07:12:40 
Job time : 11.4203 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



Searched: 



February 27, 2004, 06:40:43 ; Search time 37.3606 Seconds 

(without alignments) 
5683.620 Million cell updates/sec 

US-09-989-981A-8 
3506 

1 MAGKAAEERGLPKGATPQDT FMVLYYVSLRFIKQKPSQDW 673 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 

1017041 seqs, 315518202 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



1017041 



Database 



SPTREMBL_25:* 
1: sp_archea:* 
2: sp_bacteria : * 
3: sp__f ungi : * 
4: sp_human:* 
5: sp__invertebrate : * 
6 : sp_mammal : * 
7 : sp_mhc : * 
8 : sp_organelle : * 
sp_phage : * 
sp_plant : * 
sp_rodent : * 
sp_virus : * 
sp_vertebrate : * 
sp_unclassif ied: * 



9 

10 
11 
12 
13 
14 
15 
16 
17 



sp_rvirus : * 
sp_bacteriap : * 
sp_archeap: * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result Query 

No. Score Match Length DB ID 



Description 



1 


2883.5 


82. 


2 


672 


11 


Q7TSR6 


Q7tsr6 mus musculu 


2 


2877.5 


82. 


1 


672 


11 


Q7TSR7 


Q7tsr7 mus musculu 


3 


2871 


81. 


9 


673 


11 


Q8R543 


Q8r543 mus musculu 


4 


2835.5 


80. 


9 


672 


11 


Q8CIQ5 


Q8ciq5 rattus norv 


5 


742 


21. 


2 


668 


10 


Q9ARU4 


Q9aru4 oryza sativ 


6 


739.5 


21. 


1 


672 


10 


Q9LI82 


Q91i82 arabidopsis 


7 


735.5 


21. 


0 


725 


10 


Q9ZU35 


Q9zu35 arabidopsis 


8 


735.5 


21. 


0 


725 


10 


Q9ASR9 


Q9asr9 arabidopsis 


9 


730.5 


20. 


8 


648 


10 


Q9C6W5 


Q9c6w5 arabidopsis 


10 


723.5 


20. 


6 


646 


10 


Q9C6R7 


Q9c6r7 arabidopsis 


11 


709 


20. 


2 


662 


10 


Q949Y4 


Q94 9y4 arabidopsis 


12 


708 


20. 


2 


662 


10 


Q84TH5 


Q84th5 arabidopsis 


13 


700 


20. 


0 


609 


10 


Q9C8W6 


Q9c8w6 arabidopsis 


14 


695.5 


19. 


8 


801 


5 


Q8T691 


Q8t691 dictyosteli 


15 


686.5 


19. 


6 


652 


11 


Q7TSR8 


Q7tsr8 mus musculu 


16 


668.5 


19. 


1 


737 


10 


Q9FT51 


Q9ft51 arabidopsis 


17 


667 


19. 


0 


657 


11 


Q7TMS5 


Q7tms5 mus musculu 


18 


666 


19. 


0 


657 


11 


Q9R004 


Q9r004 mus musculu 


19 


665 


19. 


0 


687 


5 


Q9NH94 


Q9nh94 bombyx mori 


20 


659 


18. 


8 


751 


10 


Q93YS4 


Q93ys4 arabidopsis 


21 


658 


18. 


8 


687 


5 


Q94960 


Q94960 drosophila 


22 


657 


18. 


7 


657 


11 


Q80W57 


Q80w57 rattus norv 


23 


657 


18. 


7 


657 


11 


Q80ST1 


Q80stl rattus norv 


24 


653 


18. 


6 


656 


6 


Q8MIB3 


Q8mib3 sus scrofa 


25 


653 


18. 


6 


657 


11 


Q80XF3 


Q80xf3 rattus norv 


26 


651.5 


18. 


6 


635 


10 


Q9SZR9 


Q9szr9 arabidopsis 


27 


647.5 


18. 


5 


679 


5 


Q8IS30 


Q8is30 bactrocera 


28 


643.5 


18. 


4 


643 


5 


Q7YYX5 


Q7yyx5 crypt ospori 


29 


643 


18. 


3 


670 


5 


077423 


077423 bactrocera 


30 


642.5 


18. 


3 


655 


4 


Q8IX16 


Q8ixl6 homo sapien 


31 


642.5 


18. 


3 


655 


4 


Q96TA8 


Q96ta8 homo sapien 


32 


642.5 


18. 


3 


679 


5 


Q9BH97 


Q9bh97 ceratitis c 


33 


634.5 


18. 


1 


655 


4 


Q96LD6 


Q961d6 homo sapien 


34 


621.5 


17. 


7 


567 


10 


Q9FG17 


Q9fgl7 arabidopsis 


35 


621 


17. 


7 


666 


11 


Q9EPG9 


Q9epg9 rattus norv 


36 


620 


17. 


7 


662 


4 


Q86SU8 


Q86su8 homo sapien 


37 


617.5 


17. 


6 


669 


5 


Q8WRF2 


Q8wrf2 tribolium c 


38 


617 


17. 


6 


692 


10 


Q7XUM2 


Q7xum2 oryza sativ 


39 


616.5 


17. 


6 


691 


10 


Q8RWI9 


Q8rwi9 arabidopsis 


40 


613 


17. 


5 


594 


10 


Q9LJC3 


Q91jc3 arabidopsis 


41 


612.5 


17. 


5 


669 


5 


Q8WRR1 


Q8wrrl tribolium c 


42 


610.5 


17. 


4 


609 


5 


Q9VQN4 


Q9vqn4 drosophila 


43 


607.5 


17. 


3 


703 


10 


Q8RXN0 


Q8rxn0 arabidopsis 


44 


598.5 


17. 


1 


785 


4 


Q96L76 


Q96176 homo sapien 


45 


597.5 


17. 


0 


692 


5 


P91892 


P91892 aedes aegyp 



ALIGNMENTS 



RESULT 1 
Q7TSR6 

ID Q7TSR6 PRELIMINARY; PRT; ■ 672 AA. 

AC Q7TSR6; 

DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 



DE ATP-binding cassette sub- family G member 8. 

GN ABCG8 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN-PERA/Ei; TISSUE=Liver ; 

RA Wittenburg H., Lyons M.A., Li R., Churchill G.A., Carey M.C., 

RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AY196216; AAO45096.1; 

KW ATP-binding. 

SQ SEQUENCE 672 AA; 75867 MW; CAB72 0502EA8FE2 1 CRC64; 

Query Match 82.2%; Score 2883.5; DB 11; Length 672; 

Best Local Similarity 81.9%; Pred. No. 1.2e-215; 

Matches 551; Conservative 52; Mismatches 69; Indels 1; Gaps 1; 

Qy 1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 

III II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I 
Db 1 MAEKTKEETQLWNGTVLQDAS GLQDS LFS S ES DNS LYFT YS GQSNTLEVRDLT YQVDI AS 60 

Qy 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

I I I I I I I I I I I I : I I I I I : I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 61 QVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

Qy 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I I I : I I I I I I I I I I I I : I I I I I I I I I I I I I I : I I I I I I I I I I I II I I I I I I I I I I I I 
Db 121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 180 

Qy 181 QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 240 

I I I I I I I II I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 240 

Qy 241 SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I II I I I I I I 
Db 241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

Qy 301 VQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I : I : I : I I I I I I II I I I I I I I I : I I I I 
Db 301 VQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

Qy 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

I I I I I : I : I I II:! : : : I I : : I I : I I I I I I I I I I I I I I I I I I I I 

Db 361 WKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

Qy 421 GAEACLMSMT I GFLYFGHGS I QLS FMDTAALLFMI GALI P FNVI LDVI SKCYS ERAMLYY 480 

I : I I I I I I : 11111:111: I I I I I I I I I I I I M I I I I I II I I I I I : I I I : I I I : I I I I 
Db 420 GS EACLMS LI I GFL Y YGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCH S ERSMLY Y 479 

Qy 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I II III I I I I : I I I I I I I I I I I I I 



Db 



480 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLVWLWF 539 



Qy 

Db 

Qy 

Db 

Qy 

Db 



541 CCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAWISKVSFLRWCF 600 

III I I I I I : I : I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I 

540 CCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWC^ 599 

601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

|||:|||: I : I I I :: II : : I I I : I : I : I I I I I I I I I I I : I I I : III: 
600 SGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYL 659 

661 SLRFIKQKPSQDW 67 3 

I I : I I I I II! 
660 SLKLIKQKSIQDW 672 



RESULT 2 
Q7TSR7 



ID 
AC 
DT 
DT 
DT 
DE 
GN 
OS 
OC 
OC 
OX 
RN 
RP 
RC 
RA 
RA 
RT 
RT 
RT 
RL 
DR 
KW 
SQ 



Q7TSR7 PRELIMINARY; PRT; 672 AA. 

Q7TSR7; 

01-OCT-2003 (TrEMBLrel. 25, Created) 

01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

ATP-binding cassette sub-family G member 8. 

ABCG8. 

Mus musculus (Mouse) . 
Eukaryota; Metazoa; Chordata; 

Rodentia; 



Craniata; Vertebrata; Euteleostomi; 
Sciurognathi; Muridae; Murinae; Mus, 



Churchill G.A. , Carey M.C., 



Mammalia; Eutheria; 
NCBI_TaxID-10090; 
[1] 

SEQUENCE FROM N.A. 
STRAIN=I/LnJ; TISSUE=Liver ; 
Wittenburg H., Lyons M.A. , Li 
Paigen B . ; 

"Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 
Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 
Mice. "; 

Submitted (DEC-2002) to the EMBL/ GenBank/DDBJ databases. 

EMBL; AY196215; AAO45095.1; 

ATP-binding. 

SEQUENCE 672 AA; 75805 MW; E5B30B5890200A41 CRC64; 



Query Match 82.1%; Score 2877.5; DB 11; 

Best Local Similarity 81.7%; Pred. No. 3.6e-215; 
Matches 550; Conservative 52; Mismatches 70; Indels 



Length 672; 

1; Gaps 



l; 



Qy 

Db 

Qy 

Db 

Qy 

Db 



1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 
III II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I 
1 MAEKTKEETQLWNGTVLQDASGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIAS 



61 



60 



60 



120 



QVPWFEQLAQFKMPWT S P S CQNS CELGIQNLS FKVRS GQMLAI I GS SGCGRAS LLDVI TG 
I I I I I I I I I I I I : I I I I I : I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I 
61 QVPWFEQLAQFKI PWRSHS SQDSCELGI RNLS FKVRS GQMLAI I GS SGCGRAS LLDVITG 120 

121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I I I : I I I II I I I I I II : I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I II II I I I I I I 
121 RGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFS 18 0 



Qy 181 QAQRDKRVEDVIAELRLRQCADTRVGNMWRGLSGGERRRVSIGVQLLWNPGILILDEPT 240 

I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 QAQ RD KRVE DVI AELRL RQ CANT RVGNT YVRGVS GG E RRRVS I GVQ LLWN PGILILDEPT 240 

Qy 241 SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 SGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQM 300 

Qy 301 VQYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFL 360 

I M I I : I I : I I I I I I I I I I I I I I I I I I I I I I : I : I : I I I I I M I I I I I I I I I : I I I I 

Db 301 VQYFTSIGHPCPRYSNPADFWDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFL 360 

Qy 361 WKAETKDLDEDTCVESSWPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

I I I I I : I : I I I I : I : :: I I :: I I : I I I I I I I I I I I I I I I I I I I I 

Db 361 WKAEAKELNTSTHTVSLTLTQDTDC-GTAAELPGMIEQFSTLIRRQISNDFRDLPTLLIH 419 

Qy 421 GAEACLMSMTI GFLYFGHGS IQLS FMDTAALLFMI GALI PFNVI LDVI SKCYSERAMLYY 480 

I : I I I I M : llllhlll: I I I I I I I I I I I I I M I I I I I I I I I I I : I I I : I I I : I I I I 
Db 420 GS EACLMS LI I GFLYYGHGAKQLS FMDTAALLFMI GALI P FNVI LDWS KCHS ERSMLYY 479 

Qy 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 54 0 

I I I I I I I I II I I I I I I I I I I I I I I I I : I I I II III I I I I : I I I I I I I I I I I I 
Db 480 ELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHLLLVWLWF 539 

Qy 541 CCRIMALAAAALLPTFHMAS FFSNALYNS FYLAGGFMINLS SLWTVPAWI SKVSFLRWCF 600 

III I I I I I : I : I I I I I I : I I I I I I I I I I I I II I I I I : I I II I I I I I : I I I I I I I 
Db 540 CCRTMALAASAMLPT FHMS S FFCNALYNS FYLTAGFMINLDNLWI VPAWI S KLS FLRWCF 599 

Qy 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVXYYV 660 

I I I : I I I : I : I I I : : II : : I I I : I : I : I I I I I I I I I I I : I I I = III: 
Db 600 SGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYYL 659 

Qy 661 SLRFIKQKPSQDW 673 

II: MM III 
Db 660 SLKLIKQKSIQDW 672 



RESULT 
Q8R543 
ID 
AC 
DT 
DT 
DT 
DE 
GN 
OS 



Created) 

Last sequence update) 
Last annotation update) 



Q8R543 PRELIMINARY; PRT; 673 AA. 

Q8R543; 

01-JUN-2002 (TrEMBLrel. 21, 
01-JUN-2002 (TrEMBLrel . 21, 
01-OCT-2003 (TrEMBLrel. 25, 
Sterolin 2 . 
ABCG8. 

Mus musculus (Mouse) . 
OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 
OX NCBI_TaxI D= 10090; 
RN [1] 

RP SEQUENCE FROM N.A. 
RC STRAIN-129/Sv; 

RA Lu K., Zhou Y., Lee M.-H., Patel S.B.; 

RT "Molecular cloning, genomic structure and characterization of novel 

RT mouse head-to-head tandem ABC transporters."; 

RL Submitted (FEB-2001) to the EMBL/ GenBank/DDBJ databases. 



DR EMBL; AF351811; AAL82898.1; 

DR EMBL; AF351799; AAL82898.1; JOINED. 

DR EMBL; AF351800; AAL82898.1; JOINED. 

DR EMBL; AF351801; AAL82898.1; JOINED. 

DR EMBL; AF351802; AAL82898.1; JOINED. 

DR EMBL; AF351803; AAL82898.1; JOINED. 

DR EMBL; AF351804; AAL82898.1; JOINED. 

DR EMBL; AF351805; AAL82898.1; JOINED. 

DR EMBL; AF351807; AAL82898.1; JOINED. 

DR EMBL; AF351808; AAL82898.1; JOINED. 

DR EMBL; AF351809; AAL82898.1; JOINED. 

DR EMBL; AF351810; AAL82898.1; JOINED. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

SQ SEQUENCE 673 AA; 76008 MW; FA0834 0445DF259C CRC64; 



Query Match 81.9%; Score 2871; DB 11; Length 673; 

Best Local Similarity 81.8%; Pred. No. 1.2e-214; 

Matches 551; Conservative 52; Mismatches 69; Indels 2; Gaps 2 

Qy 1 MAGKAAEERGLPKGATPQDTS-GLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLA 59 

III II I I II I I I I I I II I I I I I I I II I I I I I I I I I II I I I I I I : I 
Db 1 MAEKTKEETQLWNGTVLQDASQGLQDSLFSSESDNSLYFTYSGQSNTLEVRDLTYQVDIA 60 

Qy 60 SQVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVIT 119 

I I I I I I I I I I I I I : I I I I I : I I I I I I : I I I I I I I I I I I I I I I I II II I I I I I I I I I I 
Db 61 SQVPWFEQLAQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVIT 120 

Qy 12 0 GRGHGGKIKSGQIWINGQPSSPQLVRKCV7\HVRQHNQLLPNLTVRETLAFIAQMRLPRTF 179 

I I I I I I I : I I I I I I I I I I I I : I I I I I I I I I I I I I I : II I I I I I I I I I I I I I I I I I I I I I I 
Db 121 GRGHGGKMKSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTF 180 

Qy 180 SQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEP 239 

I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I : I I I I I I I I I I I II I I I I I I I I I I I I I 

Db 181 SQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEP 240 

Qy 240 TSGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQH 299 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II II I I I II 
Db 241 TSGLDSFTAHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQ 300 

Qy 300 MVQYFTAI GYPCPRYSNPADFWDLTSIDRRSREQEIATREKAQSIAALFLEKVRDLDDF 359 

I I I I I I : I I : I I I I I I I I I I I I I I I I I I I I I I : I : I : I I I I I I I I I I I I I I I I : III 
Db 301 MVQYFTSIGHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDF 360 

Qy 360 LWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLI 419 

111111:1: I I I I : I : : : I I : : I I : I I I I I I I I I I I I I I I I I I I 

Db 361 LWKAEAKELNTSTHTVSLTLTQDTDC-GTAVELPGMIEQFSTLIRRQISNDFRDLPTLLI 419 

Qy 420 HGAEACLMSMT I GFL YFGHGS I QLS FMDTAALLFMI GALI PFNVI LDVI S KCYS ERAMLY 479 

I I : I I I I I I : I I I I I : I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I II : I I I : I I I 



Db 



420 HGSE^CLMSLIIGFLYYGHGAKQLSFMDTAALLFMIGALIPFNVILDWSKCHSERSMLY 479 



Qy 480 YELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLW 539 

I I I I I I I I I I I I I I I II I I I I I I I I I I : I I I II III MM : II I I I I I I M II 
Db 480 YELEDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELFLLHFLLWLW 539 

Qy 540 FCCRIMALAAAALLPTFHMASFFSNALYNSFYIAGGFMINLSSLWTVPAWISKVSFLRWC 599 

I II I I I I M : I : 1 I I I II : I I I I II M M I I I I I M I Ml M I I M I M M I II 
Db 540 FCCRNMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRWC 599 

Qy 600 FEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYY 659 

I I I I M I I : I Ml I : : II : M I I M M M I I I I I I I I II M I I : III 
Db 600 FSGLMQIQFNGHLYTTQIGNFTFSILGDTMISAMDLNSHPLYAIYLIVIGISYGFLFLYY 659 

Qy 660 VSLRFIKQKPSQDW 67 3 

M I : II I I III 
Db 660 LSLKLIKQKSIQDW 673 



AC 
DT 
DT 
DT 
DE 
GN 
OS 
OC 
OC 
OX 
RN 
RP 
RC 
RA 
RT 
RT 
RL 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
SQ 



Created) 

Last sequence update) 
Last annotation update) 



RESULT 4 
Q8CIQ5 

ID Q8CIQ5 PRELIMINARY ; PRT; 672 AA. 

Q8CIQ5; 

01-MAR-2003 (TrEMBLrel. 23, 
01-MAR-2003 (TrEMBLrel. 23, 
01-OCT-2003 (TrEMBLrel. 25, 
Sterolin 2 . 
ABCG8. 

Rattus norvegicus (Rat) . 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 
NCBI_TaxID=10116; 
[1] 

SEQUENCE FROM N.A. 
STRAIN=Sprague-Dawley; 

Yu H., Lu K. , Lee M. , Pandit B . , Patel s.B.; 

"The rat Abcg5 and Abcg8 : characterization, chromosomal assignment and 
genetic variation in sitosterolemic rats."; 
Submitted (AUG-2002) to the EMBL/ GenBank/DDBJ databases. 
EMBL; AY145899; AAN64276.1; -. 
GO; GO: 0016020; C:membrane; IEA. 
GO; GO: 0005524; F: ATP binding; IEA. 
GO; GO: 0004009; F: ATP-binding cassette 
GO; GO:0006810; P:transport; IEA. 
InterPro; IPR003439; ABC_transporter . 
Pfam; PF00005; ABC_tran; 1. 
ProDom; PD000006; ABC_transporter ; 1. 
PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 
PROSITE; PS50893; ABC TRANSPORTER 2; 1. 



(ABC) transporter acti . 



IEA. 



SEQUENCE 672 AA; 75906 MW; 2FE0846E71BD9D47 CRC64; 



Query Match 80.9%; Score 2835.5; DB 11; Length 672; 

Best Local Similarity 79.9%; Pred. No. 6.7e-212; 

Matches 538; Conservative 57; Mismatches 77; Indels 1; Gaps 



QY 



1 MAGKAAEERGLPKGATPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLAS 60 
II I II I I II I III M I II I I I I I I I II II I I II I I I I I I I I I M I 



Db 



1 



MAEKTKEETQLWNGTVLQDASSLQDSVFSSESDNSLYFTYSGQSNTLEVRDLTYQVDMAS 60 



Qy 61 QVPWFEQLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG 120 

111111111111:11 I I : I : I I I : I I I I I I I I I I I I I I I I I : I I I I I : I I I I I I I 

Db 61 QVPWFEQl^QFKLPWRSRGSQDSWDLGIRNLSFKVRSGQMLAIIGSAGCGRATLLDVITG 120 

Qy 121 RGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFS 180 

I I I I I : I I I I I I I I I I I I : I I I :: I I I I I I I I : II I I II I I I I I I I I I I I I I I : I I I 
Db 121 RDHGGKMKSGQIWINGQPSTPQLIQKCVAHVRQQDQLLPNLTVRETLTFIAQMRLPKTFS 180 

Qy 181 QAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPT 240 

I I I I I I I I M I I I I I I M I I I : II I I I I I I I : I I I II I I I I I I I I I I I I I I I I I I I I I ! 
Db 181 QAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPT 240 

Qy 241 SGLDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHM 300 

I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 241 SGLDSFTAHNLVRTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGVAQHM 300 

Qy 301 VQYFTAIGYPCPRYSNPADFWDLTSIDRRSREQEIATREKAQSUUVLFLEKVRDLDDFL 360 

I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I III: I I I I I I I I I : I I I I 
Db 301 VQYFTSIGYPCPRYSNPADFYVDLTSIDRRSKEQEVATMEKARLLAALFLEKVQGFDDFL 360 

Qy 361 WKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIH 420 

I I I I I I I I I I I I I : :: I I : I I I I I I I I I I I I I I I I I I I I I I I 

Db 361 WKAEAKSLDTGTYAVSQTLTQDTNC-GTAAELPGMIQQFTTLIRRQISNDFRDLPTLFIH 419 

Qy 421 GAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVILDVISKCYSERAMLYY 480 

I I I I I I I I : I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I : I I I :: I I I 

Db 420 GAEACLMSLIIGFLYYGHADKPLSFMDMAALLFMIGALIPFNVILDWSKCHSERSLLYY 479 

Qy 481 ELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWF 540 

I I I I I I I I II I I I I I : I I I I I I I I I I : I I I I I I III I I I I I : I I I I I : I : I I I I I 

Db 480 ELEDGLYTAGPYFFAKVLGELPEHCAYVIIYGMPIYWLTNLRPGPELFLLHFMLLWLWF 539 

Qy 541 CCRIMALAAAALLPT FHMAS FFSNALYNS FYLAGGFMINLS S LWTVPAWI S KVS FLRWCF 600 

III I I I I I : I : I I I I I I : I I I I I I I I I I I I I I I I I : : I I I I I I I I I : I I I I I I I 
Db 540 CCRTMALAASAMLPTFHMSSFCCNALYNSFYLTAGFMINLNNLWIVPAWISKMS FLRWCF 599 

Qy 601 EGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSGGFMVLYYV 660 

llhlll: I : I I I I : I II : : : I I : I : I : I I I I I I I I I I I : I I I : III: 
Db 600 SGLMQIQFNGHIYTTQIGNLTFSVPGDAMVTAMDLNSHPLYAIYLIVIGISCGFLSLYYL 659 

Qy 661 SLRFIKQKPSQDW 673 

11:11111 III 

Db 660 SLKFIKQKSIQDW 672 



RESULT 5 
Q9ARU4 

ID Q9ARU4 PRELIMINARY; PRT; 668 AA. 

AC Q9ARU4; 

DT 01-JUN-2001 (TrEMBLrel. 17, Created) 

DT 01-JUN-2001 (TrEMBLrel. 17, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Putative ABC transporter. 

GN P0445D12.3. 

OS Oryza sativa (Rice) . 



OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta ; Liliopsida; Poales; Poaceae; 

OC Ehrhartoideae; Oryzeae; Oryza. 

OX NCBI_TaxID=4530; 

RN [1] 

RP SEQUENCE FROM N . A. 

RC STRAIN=cv. Nipponbare; 

RA Sasaki T . , Matsumoto T., Yamamoto K. ; 

RT "Oryza sativa nipponbare (GA3 ) genomic DNA, chromosome 1, PAC 

RT clone: P0445D12. "; 

RL Submitted (DEC-2000) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AP003046; BAB4 0032.1; 

DR Gramene; Q9ARU4; 

DR GO; GO: 0016020; Crmembrane; IEA. 

DR GO; GO:0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F : ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; TPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW ATP-binding; Transport. 

SQ SEQUENCE 668 AA; 73368 MW; D1875B8C75B0F3B2 CRC64; 

Query Match 21.2%; Score 742; DB 10; Length 668; 

Best Local Similarity 30.3%; Pred. No. 5e-49; 

Matches 188; Conservative 124; Mismatches 252; Indels 56; Gaps 12; 

Qy 75 WTSPSCQNSCELG 1 QNL S FKVRS GQMLAI I GS S GC GRAS LLDVI T GR — GHGGK 126 

I : I : I : I I : : I I :: I I :: I I I I : : I I : I : I : 

Db 58 WARI TCALKNKRGDVARFLLSNAS GEAKSGRLLALMGPSGS GKTTLLNVLAGQLTAS P S L 117 

Qy 127 IKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDK 186 

I I : : I I I : I I : : I : I I I : I I I I I I I : I : : : I I I : : : : 

Db 118 HLSGFLYINGRPISEGGYK — IAYVRQEDLFFSQLTVRETLSLAAELQLRRTLTPERKES 175 

Qy 187 RVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSF 246 

II:: II 111:11: I I I : II I I : : I : I : : I : : I I : I I I I : I I I : I 
Db 176 YWDLLFRLGLWCADS I VGDAKVRGI SGGEKKRLSLACELI AS PS I 1 FADEPTTGLDAF 235 

Qy 247 TAHNLVKTLSRLAKGNRLVTjISLHQPRSDIFRLFDLVLLMTSGTPIYLG-AAQHMVQYFT 305 

I : : : | | : | I : I : I : I I I I : : I I : : I : : I I I : I I : : I I 

Db 236 QAEKVMETLRQLAEDGHTVICSIHQPRGSVYGKFDDIVLLSEGEVIYMGPAKEEPLLYFA 295 

Qy 306 AIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFLWKAET 365 

::|| II : 111:1 II |:| II ::|:: ::| I II II 
Db 296 SLGYHCPDHVNPAEFLADLI SVDYS SAESVQS SRKRI ENLI EEFSNKV AIT 346 

Qy 366 KDLDEDTCVESSVT-PLDTNCLP S PTK-MPGAVQQFTTLI RRQI SNDFRDLPTLL 418 

: I I : I I : I III I : I I I : I I I I I I 

Db 347 ES NSSLTNPEGSEFSPKLIQKSTTKHRRGWWRQFRLLFKRAWMQAFRDGPTNK 399 



Qy 419 IHGAEACLMSMT I GFLYFGHGSIQLS FMDTAALLFMI GALI PFNVI LDVI SKCYSERAML 478 

: : :: | ::: | I I I I I : : : III:: 

Db 400 VRARMSVASAIIFGSVFWRMGKTQTSIQDRMGLLQVTAINTAMAALTKTVGVFPKERAIV 459 

Qy 479 YYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLV 538 

I II I I I : I : I I : I I : : I : I I : : I I I : I : 

Db 460 DRERAKGSYALGPYLSSKLLAEIPIGAAFPLIFGSILYPMSKLHPTFSRFAKFCGIVTVE 519 

Qy 539 VFCCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPAW 598 

I II 1:11 I : I | : | | : : | : : I I I I I : I I 

Db 520 SFAASAMGLTVGAMAPTTEAAMALGPSLMTVFIVFGGYYWPDNTPVIFRWIPKVSLIRW 579 

Qy 599 CFEGLMKIQF SRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYAIYLIVIGLSG 652 

1:11 : I : : | : | | : : | | : : : 

Db 580 AFQGLCINEFKGLQFEQQHSYDIQTGE QALERFSLGGIRIADTLVAQ 626 

Qy 653 GFMVLYYVSLRFI KQKP 669 

I ::::: I :: I :| 
Db 627 GRI LMFWYWLT YLLLKKNRP 646 



RESULT 6 
Q9LI82 

ID Q9LI82 PRELIMINARY; PRT; 672 AA. 

AC Q9LI82; 

DT 01-OCT-2000 (TrEMBLrel. 15, Created) 

DT 01-OCT-2000 (TrEMBLrel. 15, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ABC transporter-like protein. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Columbia; 

RA Kaneko T., Kato T., Sato S., Nakamura Y-, Asamizu E., Tabata S.; 

RL Submitted (MAR-2000) to the EMBL/ GenBank/ DDB J databases. 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Columbia; 

RX MEDLINE=2 0363099; PubMed= 10907853; 

RA Nakamura Y. ; 

RT "Structural analysis of Arabidopsis thaliana chromosome 3. II. 

RT Sequence features of the regions of 4,251,695 bp covered by ninety PI, 

RT TAC and BAC clones . " ; 

RL DNA Res . 7:217-221(2 000). 

DR EMBL; AP001313; BAB03081.1; -. 

DR GO; GO: 0016020; C: membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F : ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC__t ran s porter . 

DR InterPro; IPR006162; Ppantne_S . 



DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 672 AA; 75269 MW; 2 0B2D99215600135 CRC64; 



Query Match 21.1%; Score 739.5; DB 10; Length 672; 

Best Local Similarity 30.6%; Pred. No. 7.9e-49; 

Matches 221; Conservative 12 6; Mismatches 2 51; Indels 125; Gaps 27; 

Qy 7 EERGLPK— GATPQDTSGLQDRLFSSES DN SLYFTYSGQPN 45 

: I || | : I I : I : I I I I II I I : I 

Db 7 QESSFPKTPSANRHETSPVQENRFSSPSHVNPCLDDDNDHDGPSHQSRQSSVLRQSLRPI 66 

Qy 46 TLEVRDLNYQVD LASQVPWFEQLAQFKMPWT S PS CQNS CELGI QNLS FKVR 96 

I : : I I : I I I : I III I : 

Db 67 ILKFEELTYSIKSQTGKGSYWFGSQEPKPNRLVL KCVSGI VK 108 

Qy 97 S GQMLAI I GS S GCGRASLLDVI TGRGHGGKI KSGQI WINGQPS S PQLVRKCVAHVRQHNQ 156 

I : : I I : : I I I I : : I : : I I I I : I I : II : I : : I I I I : 

Db 109 PGELLAMLGPSGSGKTTLVTALAGRLQ-GKL-SGTVSYNGEPFTSSVKRK-TGFVTQDDV 165 

Qy 157 LLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGG 216 

I I : I I I I I I : I : I I I : : : : : : : I I I : : : I I : I : : : I : I I : I I I 
Db 166 LYPHLTVMETLTYTALLRLPKELTRKEKLEQVEMWSDLGLTRCCNSVIGGGLIRGISGG 225 

Qy 217 ERRRVS I GVQLLWNPGI LI LDEPTSGLDS FTAHNLVKTLSRLAKGNRLVLI SLHQPRSDI 276 

11:11111 :: I II : I : I I I I I I I I I I II : I II I I : I I I : -III I : 
Db 226 ERKRVSIGQEMLVNPSLLLLDEPTSGLDSTTAARIVATLRSLARGGRTWTTIHQPSSRL 285 

Qy 277 FRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GY-PCPRYSNPADFYVDLT SIDRRSR 332 

: I : I I I I : : : I I I I I : : : : I I : I I I I : I I I I I : I I : I : 

Db 286 YRMFDKVLVLSEGCPIYSGDSGRVMEYFGSIGYQPGSSFWPADFVLDLANGITSDTKQY 345 

Qy 333 EQELATREKAQSLAALFLEKWDLDDFLWKAETKDLDEDTCVTISSVT-PLD 382 

: I : I : I I : : I : I : I I I I I I 

Db 346 DQ-IETNGRLDR LEEQNSVKQSLISSYKKNLYPPLKEEVSRTFPQDQTNARLRKK 399 

Qy 383 --TNCLPSPTKMPGAVQQFTTLIRRQIS -NDFRDLPTLLIHGAEACLMSMTIGFLYF 436 

Mh I I I : I : : I : II : : : | : | | : : 

Db 400 AITNRWPTSWWM QFSVLLKRGLKERSHESFSGLRIFMVMS VSLLSGLLWW 449 

Qy 437 GHGS IQLS FMDTAALLFMI GALI P FNVI LDVI S KCYS ERAMLYYELEDGLYTTGP YFFAK 496 

I : I III I : : I I I I I I I : I I : I : 

Db 450 -HSRV-AHLQDQVGLLFFFSIFWGFFPLFNAIFTFPQERPMLIKERSSGIYRLSSYYIAR 507 

Qy 497 ILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLWLWFCCRIMALAAAALLPTF 556 

: I : I I I : III: I : I I I : : : : I I : : I I I : I 

Db 508 TVGDLPMELI LPT I FVT I T YWMGGLKPS LTT FIMTLMI VLYNVLVAQGVGLALGAI LMDA 567 

Qy 557 HMAS FFSNALYNS FYLAGGFMINLS SLWTVP AWISKVSFLRWCFEGLMKIQFS-RRT 612 

I : I : I I I I I I : I : I I I : III : I : : I : : I : : 

Db 568 KKAATLSSVLMLVFLLAGGYYIQ HIPGFIAWLKYVSFSHYCYKLLVGVQYTWDEV 622 



Qy 613 YKMPLG NLTIAVSGDKILSAMELDSYPLYAI YLIVIGLSGGFMVLYYV 660 

I : | III I : : : I I : I : : : I I I : 

Db 623 YECGSGLHCSVMDYEGIKNLRI GNMMWDVLAL AVMLLL YRVLAYL 667 

Qy 661 SLR 663 

: I I 

Db 668 ALR 670 



RESULT 7 
Q9ZU35 

ID Q9ZU35 PRELIMINARY; PRT; 725 AA. 

AC Q9ZU35; 

DT 01-MAY-1999 (TrEMBLrel. 10, Created) 

DT 01-MAY-1999 (TrEMBLrel. 10, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Putative ABC transporter. 

GN AT2G01320. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudi cotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=37 02 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=cv. Columbia; 

RX MEDLINE=20083487; PubMed=l 0617 197 ; 

RA Lin X., Kaul S., Rounsley S.D., Shea T.P., Benito M.-I., Town CD., 

RA Fujii C.Y., Mason T.M., Bowman C.L., Barnstead M.E., Feldblyum T.V., 

RA Buell C.R., Ketchum K.A. , Lee J. J., Ronning CM., Koo H., Moffat K.S., 

RA Cronin L.A., Shen M. , VanAken S.E., Umayam L., Tallon L.J., Gill J.E., 

RA Adams M.D., Carrera A.J., Creasy T.H., Goodman H.M., Somerville C.R., 

RA Copenhaver G.P., Preuss D. , Nierman W.C, White O., Eisen J. A., 

RA Salzberg S.L., Eraser CM. , Venter J.C; 

RT "Sequence and analysis of chromosome 2 of the plant Arabidopsis 

RT thaliana."; 

RL Nature 402:761-768(1999). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN^cv. Columbia; 

RA Lin X. ; 

RL Submitted (MAR-2000) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AC006200; AAD14532.1; -. 

DR PIR; C84423; C84423. 

DR GO; GO: 0016020; Cmembrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti . . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABCjtransporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABCjtransporter; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC TRANSPORTER 2; 1. 



KW ATP-binding; Transport. 

SQ SEQUENCE 725 AA; 78899 MW; 7DB2E556FE3553D7 CRC64; 

Query Match 21.0%; Score 735.5; DB 10; Length 725; 

Best Local Similarity 30.0%; Pred. No. 1.8e-48; 

Matches 186; Conservative 123; Mismatches 229; Indels 81; Gaps 15; 

Qy 75 WTSPSC QNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITG RG 122 

I : : I I : : [ : | : : | : : | M : | | | | : : | | : | : | | 

Db 72 WRNITCSLSDKSSKSVRFLLKNVSGEAKPGRLLAIMGPSGSGKTTLLNVLAGQLSLSPRL 131 

Qy 123 HGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQA 182 

I I I : : I I : II I : : : I I I I : I I I I I I I : I I : : : I I I 

Db 132 H LSGLLEVNGKPSSSKAYK — LAFVRQEDLFFSQLTVRETLSFAAELQLPEISSAE 185 

Qy 183 QRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSG 242 



Db 186 ERDEYVNNLLLKLGLVSCADSCVGDAKVRGISGGEKKRLSLACELIASPSVIFADEPTTG 24 5 

Qy 243 LDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLG-AAQHMV 301 

I I : I I : : : I I : I I : I : I : I I I I : : I I : : I : I I I : I I I : ■ 

Db 246 LDAFQAEKVMETLQKLAQDGHTVICSIHQPRGSVYAKFDDIVLLTEGTLVYAGPAGKEPL 305 

Qy 302 QYFTAIGYPCPRYSNPADFYVDLTSIDRRSREQELATREKAQSLAALFLEKVRDLDDFLW 361 



Db 306 TYFGNFGFLCPEHWPAEFLADLISVDYSSSETWSSQKRVHALVDAFSQR 356 

Qy 362 KAETKDLDEDTCVESSV TPLDTNCLPSPTK MPGAVQQFTTLIRR 405 

III III : II I : I I I : : I 

Db 357 SSSVLYATPLS MKEETKNGMRPRRKAIVERTDGWWRQFFLLLKR 400 

Qy 406 QI SNDFRDLPTLLI HGAEACLMSMTI GFLYFGHGSIQLS FMDTAALLFMI GALI PFNVI L 465 

I I II : : : : | : : : I I I I I I : I : I : 
Db 401 AWMQAS RDGPTNKVRARMSVASAVIFGSVFWRMGKSQTSIQDRMGLL-QVAAI NTAM 456 

Qy 466 DVISKCY SERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYIII YGMPTYWLANL 521 

: : I III:: I I I : I I I : I : I : I I : : : : I I : I I 

Db 457 AALTKTVGVFPKERAIVDRERSKGSYSLGPYLLSKTIAEIPIGAAFPLMFGAVLYPM7VRL 516 

Qy 522 RPGLQPFLLHFLLWLWFCCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLS 581 

III : I : I II I :: I : I : I I : I I : : I 

Db 517 NPTLSRFGKFCGIWVESFAASAMGLTVGT^PSTEAAMAVGPSLMTVFIVFGGYYVNAD 576 

Qy 582 SLWTVPAWISKVSFLRWCFEGLMKIQFS RRT YKMP LGNLT IAVSGDKILSA 632 

: : I I : I : I I I : I I : I I : I : : I :: I :| 

Db 577 NTPIIFRWIPRASLIRWAFQGLCINEFSGLKFDHQNTFDVQTGEQALERLSFGGRRIRET 636 

Qy 633 MELDSYPLY AIYLIV 647 

: II III:: 
Db 637 IAAQSRILMFWYSATYLLL 655 



RESULT 8 
Q9ASR9 

ID Q9ASR9 PRELIMINARY; PRT; 725 AA. 

AC Q9ASR9; 

DT 01-JUN-2001 (TrEMBLrel. 17, Created) 



DT 01-JUN-2001 (TrEMBLrel. 17, Last sequence update) 

DT Ol-OCT-2003 (TrEMBLrel. 2.5, Last annotation update) 

DE At2g01320/F10A8.20. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Cheuk R., Chen H., Kim C.J., Meyers M.C., Shinn P., Banh J., 

RA Bowser L., Carninci P., Chung M.K., Goldsmith A.D., Hayashizaki Y. , 

RA Ishida J., Jones T., Kamiya A., Karlin-Neumann G. , Kawai J., Lam B., 

RA Lee J.M., Lin J., Liu S.X., Miranda M. , Narusaka M., Nguyen M. , 

RA Palm C.J., Pham P.K., Quach H.L., Sakano H., Sakurai T., Satou M. , 

RA Seki M. , Southwick A., Toriumi M. , Yamada K. , Yu G. , Shinozaki K., 

RA Davis R.W., Theologis A., Ecker J.R.; 

RT "Arabidopsis cDNA clones."; 

RL Submitted (MAR-2001) to the EMBL/ GenBank/DDBJ databases. 

RN [2] 

RP SEQUENCE FROM N.A. 

RA Cheuk R., Chen H., Kim C.J., Shinn P., Banh J., Bowser L., 

RA Carninci P., Chang E., Dale J.M., Goldsmith A.D., Hayashizaki Y. , 

RA Ishida J., Jones T., Kamiya A., Karlin-Neumann G., Kawai J. , Lam B . , 

RA Lee J.M. , Lin J., Miranda M. , Narusaka M. , Nguyen M. , Onodera C.S., 

RA Palm C.J., Quach H.L., Sakurai T . , Satou M. , Seki M., Southwick A., 

RA Tang C.C., Toriumi M., Wu H.C., Yamada K. , Yamamura Y. , Yu G. , Yu S., 

RA Shinozaki K., Davis R.W., Theologis A., Ecker J.R.; 

RT "Arabidopsis ORF clones."; 

RL Submitted (JUL-2002) to the EMBL/GenBank/DDB J databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AF367318; AAK32905.1; 

DR EMBL; AY133617; AAM91447.1; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F : ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Transport. 

SQ SEQUENCE 725 AA; 78998 MW; 68A7E556FE2FE3D7 CRC64; 

Query Match 21.0%; Score 735.5; DB 10; Length 725; 

Best Local Similarity 30.0%; Pred. No. 1.8e-48; 

Matches 186; Conservative 123; Mismatches 229; Indels 81; Gaps 15 

Qy 75 WTSPSC QNS CELGI QNLS FKVRSGQMLAI I GS S GCGRAS LLDVI TG RG 122 

I : : I I : : | : | : : | : : | | | : | | | | : : | | : I : I I 

Db 72 WRNITCSLSDKSSKSVRFLLKNVSGE7VKPGRLLAIMGPSGSGKTTLLNVLAGQLSLSPRL 131 

Qy 123 HGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQA 182 

I I I : : I I : I I I : : : I I I I : I I I I I I I : I I : : : I I I 

Db 132 H LSGLLEVNGKPSSSKAYK — LAFVRQEDLFFSQLTVRETLSFAAELQLPEISSAE 185 



Qy 183 QRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSG 242 

: I I : I : : : : I I I I I : I I : I I I : I I I I : : I : I : : I : : I : : I I I I : I 
Db 186 ERDEYVNNLLLKLGLVSCADSCVGD7VKVRGISGGEKKRLSLACELIASPSVIFADEPTTG 245 

Qy 243 LDSFTAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLG-AAQHMV 301 

I I : I I : : : I I : I I : I : I : I II I : : I I : : I : I I I : I I I : : 

Db 246 LDAFQAEKVMETLQKLAQDGHTVICSIHQPRGSVYAKFDDIVLLTEGTLVYAGPAGKEPL 305 

Qy 302 QYFTAIGYPCPRYSNPADFWDLTSIDRRSREQELATREKAQSIAALFLEKVRDLDDFLW 361 

II I : I I : I I I : I I I I : I II : : : : : : I I : : 

Db 306 TYFGNFGFLCPEHVNPAEFLADLI SVDYS S SETVYS SQKRVHALVDAFSQR 356 

Qy 362 KAETKDLDEDTCVESSV TPLDTNCLPSPTK MPGAVQQFTTLIRR 405 

III III : II I : I I I : : I 

Db 357 SSSVLYATPLS MKEETKNGMRPRRKAIVERTDGWWRQFFLLLKR 400 

Qy 406 QISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMIGALIPFNVTL 465 

MM: : :: | ::: | | | | I I : I : I : 
Db 401 AWMQASRDGPTNKVRARMSVASAVIFGSVFWRMGKSQTSIQDRMGLL-QVAAI NTAM 456 

Qy 466 DVISKCY SERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANL 521 

: : I MM: I I I : I I I : I : I : I I : : : : I I : I I 

Db 457 AALTKTVGVFPKERAIVDRERSKGSYSLGPYLLSKTIAEIPIGAAFPLMFGAVLYPMARL 516 

Qy 522 RPGLQPFLLHFLLWLWFCCRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLS 581 

III : I : I II I :: I : I : I I : I I : : I 

Db 517 NPTLSRFGKFCGIVTVESFAASAMGLTVGAMVPSTEAAMAVGPSLMTVFIVFGGYYVNAD 576 

Qy 582 SLWTVPAWISKVS FLRWCFEGLMKIQFS RRTYKMPLGNLT IAVSGDKILSA 632 

: : | | : | : | | | : I | : I I : I : : | : : I : I 

Db 577 NTPIIFRWIPRASLIRWAFQGLCINEFSGLKFDHQNTFDVQTGEQALERLSFGGRRIRET 636 

Qy 633 MELDSYPLY AIYLIV 647 

: II III:: 
Db 637 IAAQSRILMFWYSATYLLL 655 



RESULT 9 
Q9C6W5 

ID Q9C6W5 PRELIMINARY; PRT; 648 AA. 

AC Q9C6W5; 

DT 01-JUN-2001 (TrEMBLrel. 17, Created) 

DT 01-JUN-2001 (TrEMBLrel. 17, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Hypothetical protein (ABC transporter, putative) , 

GN F27M3_2 OR AT1G3177 0/F27M3_2 . 

OS Arabidopsis thaliana (Mouse-ear cress). 

OC Eukaryota; Viridiplantae ; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta ; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702; 

RN ' [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=cv. Columbia; 

RX MEDLINE=21016719; PubMed=111307 12 ; 

RA Theologis A., Ecker J.R., Palm C.J., Federspiel N.A., Kaul S., 

RA White O., Alonso J., Altafi H., Araujo R. , Bowman C.L., Brooks S.Y., 



RA Buehler E. f Chan A. , Chao Q., Chen H., Cheuk R.F., Chin C.W., 

RA Chung M.K., Conn L., Conway A.B., Conway A.R., Creasy T.H., Dewar K. , 

RA Dunn P., Etgu P., Feldblyum T.V., Feng J.-D-, Fong B., Fujii C.Y., 

RA Gill J.E., Goldsmith A.D., Haas B., Hansen N.F., Hughes B. , Huizar L., 

RA Hunter J.L., Jenkins J., Johnson-Hopson C, Khan S., Khaykin E., 

RA Kim C.J., Koo H.L., Kremenetskaia I., Kurtz D.B., Kwan A., Lam B., 

RA Langin-Hooper S., Lee A. , Lee J.M., Lenz C.A., Li J.H., Li Y.-P., 

RA Lin X., Liu S.X., Liu Z.A., Luros J.S., Maiti R. , Marziali A., 

RA Militscher J., Miranda M. , Nguyen M. , Nierman W.C., Osborne B.I., 

RA Pai G. , Peterson J., Pham P.K., Rizzo M. , Rooney T., Rowley D. r 

RA Sakano H., Salzerg S.L., Schwartz J.R., Shinn P., Southwick A.M. , 

RA Sun H., Tallon L.J., Tambunga G., Toriumi M.J. f Town CD., 

RA Utterback T., Van Aken S., Vaysberg M., Vysotskaia V.S., Walker M. , 

RA Wu D., Yu G., Fraser CM., Venter J.C, Davis R.W.; 

RT "Sequence and analysis of chromosome 1 of the plant Arabidopsis 

RT thaliana."; 

RL Nature 408:816-820(2000). 

RN [2] 

RP SEQUENCE FROM N.A. 

RA Haas B.J., Volfovsky N., Town CD., Troukhan M. , Alexandrov N., 

RA Feldmann K.A. , Flavell R.B., White O. , Salzberg S.L.; 

RT "Full-length messenger RNA sequences greatly improve genome 

RT annotation."; 

RL Genome Biol. 0:0-0(2002). 

RN [3] 

RP SEQUENCE FROM N.A. 

RA Brover V., Troukhan M. , Alexandrov N., Lu Y.-P-, Flavell R. , 

RA Feldmann K. ; 

RT "Full-Length cDNA from Arabidopsis thaliana."; 

RL Submitted (MAR-2002) to the EMBL/GenBank/DDB J databases. 

RN [4] 

RP SEQUENCE FROM N.A. 

RC STRAIN=cv. Columbia; 

RA Seki M. , Iida K., Satou M., Sakurai T., Akiyama K., Ishida J., 

RA Nakajima M. , Enju A., Kamiya A., Narusaka M. , Carninci P., Kawai J., 

RA Hayashizaki Y. , Shinozaki K. ; 

RT "Arabidopsis thaliana full-length cDNA . " ; 

RL Submitted (NOV-2002) to the EMBL/GenBank/DDB J databases. 

DR EMBL; AC074360; AAG60152.1; 

DR EMBL; AY088793; AAM67104.1; -. 

DR EMBL; AK117530; BAC42192.1; -. 

DR GO; GO: 0016020; Cmembrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW Hypothetical protein. 

SQ SEQUENCE 648 AA; 72618 MW; D52A2D24 34A5BB9D CRC64; 

Query Match 20.8%; Score 730.5; DB 10; Length 648; 

Best Local Similarity 30.7%; Pred. No. 3.8e-48; 

Matches 211; Conservative 117; Mismatches 269; Indels 91; Gaps 19; 



Qy 9 RGLPKGATPQDTSGLQDRLFSSES — DNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFE 66 

: I I I : I II :|: |:| | ||: :: |:| : I 
Db 19 QGLPDMSDTQSKSVLAFPTITSQPGLQMSMY PITLKFEEWYKVKI E 65 

Qy 67 QLAQFKMPWTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGK 126 

I : I II : : : : I I : I I : : I I I I : : I I : I I I 

Db 66 QTSQCMGSWKSKE KTILNGITGMVCPGEFLAMLGPSGSGKTTLLSALGGR — LSK 118 

Qy 127 IKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDK 186 

II:: I I I I I : I : I I : I I : I I I I I I I I : I I I : : : : : : 
Db 119 TFSGKVMYNGQPFSGCIKRR-TGFVAQDDVLYPHLTVWETLFFTALLRLPSSLTRDEKAE 177 

Qy 187 RVEDVI AELRLRQCADTRVGNMYVRGLS GGERRRVS I GVQLLWNPGI LI LDEPT S GLDS F 246 

I : I I I I I I : I : : : I I I : I I II : : I I I I I : : I II : I : I I I I I I I I I I 

Db 178 HVDRVIAELGLNRCTNSMIGGPLFRGISGGEKKRVSIGQEMLINPSLLLLDEPTSGLDST 237 

Qy 247 TAHNLVKTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTA 306 

I I I : I I : I I I I I I : : : I I I I I : : I I I : I : : I : I I I I I I I : I I : : 

Db 238 TAHRI VTT I KRLAS GGRTWTT I HQP S S RI YHMFDKWLLSEGS P I YYGAAS SAVEYFS S 297 

Qy 307 IGYPCPRYSNPADFYVDLTS IDRRSREQELATREKAQSLAALFLEKVRDLDDFLW 361 

: I : I I I I : I I : : : I I I I : : : I : : : : 
Db 298 LGFSTSLTVNPADLLLDLANGIPPDTQKETSEQEQKTVK — ETLVSAYEKNI 347 

Qy 362 KAETKDLDEDTCVESS VTPLDTNCLPSPTKMPGAVQQFTTLIRRQI-SNDFRDLPT 416 

i I I : I I I II I I I I :: I : I 

Db 348 — STK-LKAELCNAESHSYEYTKAAAKNLKSEQWCTTWWYQFTVLLQRGVRERRFESFNK 404 

Qy 417 LLI HGAEACLMSMTI GFL YFGHGS I QLS FMDTAALLFMI GALI P FNVI LDVI S KC YS ERA 476 

II : : | | : | : : I I I I I I : : : I : 

Db 405 LRIF QVISVAFLGGLLWWH-TPKSHIQDRTALLFFFSVFWGFYPLYNAVFTFPQEKR 460 

Qy 477 MLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVW 536 

II I I : I | | | : : | : | | I : I I : I : I I : I I : I 

Db 4 61 MLIKERSSGMYRLSSYFMARNVGDLPLELALPTAFVFIIYWMGGLKPDPTTFILSLLWL 52 0 

Qy 537 LWFCCRIMALAAAALLPTFHMAS FFSNALYNS FYLAGGFMINLS SLWTVP AWISKV 593 

I : : I I I I I I : : : | : | | | : : : I I : : 

Db 521 YSVLVAQGLGLAFGALLMNIKQATTLASVTTLVFLIAGGYYVQ QIPPFIVWLKYL 57 5 

Qy 594 SFLRWCFEGLMKIQFSRRTY KMPLGNLTIAVSGDKIL SAMEL 635 

I: :|:: I: I I:: I I I I I I I : : I 

Db 576 SYSYYCYKLLLGIQYTDDDYYECSKGVWCRVGDFPAIKSMGLNNLWI DVFVMGVML 631 

Qy 636 DSYPLYAIYLIVIGLSGGFMVLYYVSLR 663 

III : I I : I I I 

Db 632 VGYRLMA YMALHRVKLR 64 8 

RESULT 10 
Q9C6R7 

ID Q9C6R7 PRELIMINARY; PRT; 646 AA. 

AC Q9C6R7; 

DT 01-JUN-2001 (TrEMBLrel. 17, Created) 
DT 01-JUN-2001 (TrEMBLrel. 17, Last sequence update) 
DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 
DE ABC transporter, putative. 



GN F5M6.22. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae ; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=cv. Columbia; 

RX MEDLINE=21016719; PubMed=l 1130712 ; 

RA Theologis A., Ecker J.R., Palm C.J., Federspiel N.A. , Kaul S., 

RA White O., Alonso J., Altafi H., Araujo R. , Bowman CL., Brooks S.Y., 

RA Buehler E., Chan A., Chao Q. , Chen H., Cheuk R.F., Chin C.W., 

RA Chung M.K., Conn L . , Conway A.B., Conway A.R., Creasy T.H., Dewar K., 

RA Dunn P., Etgu P., Feldblyum T.V., Feng J.-D-, Fong B., Fujii C.Y., 

RA Gill J.E., Goldsmith A.D., Haas B., Hansen N.F., Hughes B . , Huizar L., 

RA Hunter J.L., Jenkins J., Johnson-Hopson C, Khan S., Khaykin E. , 

RA Kim C.J., Koo H.L., Kremenetskaia I., Kurtz D.B., Kwan A., Lam B., 

RA Langin-Hooper S., Lee A., Lee J.M., Lenz CA. , Li J.H., Li Y.-P., 

RA Lin Liu S.X., Liu Z.A. , Luros J.S., Maiti R. , Marziali A. , 

RA Militscher J., Miranda M., Nguyen M. , Nierman W.C., Osborne B.I., 

RA Pai G., Peterson J., Pham P.K., Rizzo M. , Rooney T., Rowley D. f 

RA Sakano H., Salzerg S.L., Schwartz J.R., Shinn P., Southwick A.M., 

RA Sun H., Tallon L.J., Tambunga G. , Toriumi M.J., Town CD., 

RA Utterback T., Van Aken S., Vaysberg M. , Vysotskaia V.S., Walker M. , 

RA Wu D., Yu G., Fraser CM., Venter J.C, Davis R.W. ; 

RT "Sequence and analysis of chromosome 1 of the plant Arabidopsis 

RT thaliana."; 

RL Nature 408:816-820(2000). 

DR EMBL; AC079041; AAG50724.1; 

DR PIR; C86441; C86441. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER__1 ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW ATP-binding. 

SQ SEQUENCE 646 AA; 72342 MW; 7A9624F82FD88A6E CRC64; 

Query Match 20.6%; Score 723.5; DB 10; Length 646; 

Best Local Similarity 30.6%; Pred. No. 1.3e-47; 

Matches 208; Conservative 119; Mismatches 262; Indels 91; Gaps 20; 

Qy 22 GLQDRLFSSESDNSLYF-TYSGQPN TLEVRDLNYQVDLASQVPWFEQLAQFKMP 74 

III: : : I : I I I : I I : : : : : I : I : I I : I 

Db 2 0 GLPD-MSDTQSKSVLAFPTITSQPGLQMSMYPITLKEWYKVKI EQTSQCMGS 71 

Qy 75 WTSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWI 134 

II : : : : I I : I I : : I I I I : : I I : I I I II:: 

Db 72 WKSKE KTILNGITGMVCPGEFLAMLGPSGSGKTTLLSALGGR — LSKTFSGKVMY 124 



Qy 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

I I I I I : I : | | : | | : | | | | | | | | : | | | :::::: | : | | I | 

Db 125 NGQP FSGCI KRR-TGFVAQDDVLYPHLTVWETLFFTALLRLPS SLTRDEKAEHVDRVIAE 183 

Qy 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 

I I : I : : : I I I : I I I I :: I I I I I : : I II : I : I I I I I I I I I I III : I I 

Db 184 LGLNRCTNSMIGGPLFRGISGGEKKRVSIGQEMLINPSLLLLDEPTSGLDSTTAHRIVTT 243 

Qy 255 L S RLAKGNRL VL I S LHQ P RS D I FRL FDLVLLMT S GT P I YLGAAQHMVQ YFTAI G YPC P RY 314 

: Ml I I I : : : I II I I : Ml I M : : I M I I II I I M I : : M : 
Db 244 I KRLAS GGRT WTT I HQP S S RI YHMFDKWLLS EGS P I YYGAAS SAVEYFS S LGFSTSLT 303 

Qy 315 SNPADFYVDLTS IDRRSREQELATREKAQSLAALFLEKVRDLDDFLWKAETKDLD 369 

I I I I Ml: : : I I I I : : : | : : : : III 
Db 304 VNPADLLLDLANGIPPDTQKETSEQEQKTVK— ETLVSAYEKNI STK-LK 350 

Qy 370 EDTCVESS VTPLDTNCLPSPTKMPGAVQQFTTLIRRQI-SNDFRDLPTLLIHGAEA 424 

: I I I II I I I I : M : I II 
Db 351 7VELCNAESHSYEYTKAAAKNLKSEQWCTTWWYQFTVLLQRGVRERRFESFNKLRIF Q 407 

Qy 425 CLMSMTI GFL YFGHGS I QLS FMDTAALLFMI GALI P FNVI LDVI SKC YS ERAML YYELED 484 

: M I : I : : I I I I I I : : : hill 

Db 408 VISVAFLGGLLWWH-TPKSHIQDRTALLFFFSVFWGFYPLYNAVFTFPQEKRMLIKERSS 466 

Qy 485 GLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWFCCRI 544 

I M I I I : M M I I : II: I : I I : I I M I : 

Db 467 GMYRLSSYFMARNVGDLPLELALPTAEVFIIYWMGGLKPDPTTFILSLLVVLYSVLVAQG 526 

Qy 545 MALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVP AWISKVSFLRWCFE 601 

: I I III I : : : I Mlh : M I : M : M : : 

Db 527 LGLAFGALLMN I KQATT LAS VTT LVFL 1 AGG YYVQ QI PPFIVWLKYLS YS YYCYK 581 

Qy 602 GLMKIQFSRRTY KMPLGNLTIAVSGDKILSAMELDSYPLYAI 643 

I : I I : : I I I I I I I : : I I I I 

Db 582 LLLGIQYTDDDYYECSKGVWCRVGDFPAIKSMGLNNLWI DVFVMGVMLVGYRLMA- 636 

Qy 644 YLIVI GLSGGFMVLYYVSLR 663 

M I : I I I 

Db 637 YMALHRVKLR 646 



RESULT 11 
Q949Y4 

ID Q949Y4 PRELIMINARY; PRT; 662 AA. 

AC Q949Y4; 

DT 01-DEC-2001 (TrEMBLrel. 19, Created) 

DT 01-DEC-2001 (TrEMBLrel. 19, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Putative ABC transporter protein. 

GN F17M19.11. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae ; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta ; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=37 02; 

RN [1] 



RP SEQUENCE FROM N.A. 

RA Yamada K. , Liu S.X., Pham P.K., Banh J., Banno F. , Dale J.M., 

RA Goldsmith A.D., Jiang P.X., Lee J.M. , Onodera C.S., Quach H.L., 

RA Tang C, Toriumi M. , Yamamura Y., Yu G. , Yu S., Bowser L., 

RA Carninci P., Chen H., Cheuk R. , Hayashizaki Y., Ishida J., Jones T . , 

RA Kamiya A., Karlin-Neumann G., Kawai J. r Kim C, Koesema E., Lam B . , 

RA Lin J. , Meyers M.C., Miranda M. , Narusaka M., Nguyen M. , Palm C.J., 

RA Sakurai T., Satou M., Seki M. , Shinn P., Southwick A., Tracy S.E., 

RA Shinozaki K., Davis R.W. , Ecker J.R., Theologis A.; 

RT "Full Length cDNA of gene F17M19.11 (GI : 12324545 )." ; 

RL Submitted (AUG-2001) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AY050810; AAK92745.1; -. 

DR GO; GO: 0016020; C : membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER__2 ; 1. 

KW ATP-binding. 

SQ SEQUENCE 662 AA; 72903 MW; CD5BC0853261BC45 CRC64; 



Query Match 20.2%; Score 709; DB 10; Length 662; 

Best Local Similarity 31.2%; Pred. No. 1.8e-46; 

Matches 216; Conservative 107; Mismatches 233; Indels 136; Gaps 23 

Qy 44 PNTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSPSCQNSCELGI-QNLSFKVRS 97 

I I I : I : I : I : : : I I I I : I I : I I 

Db 37 PITLKFVDVCYRVKIHGM SNDSCNIKKLLGLKQKPSDETRSTEERT 82 

Qy 98 GQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRK 146 

I : : I : : I I I I : : : I I : : I I I I : : I : I I I : I : : : 

Db 83 ILSGVTGMISPGEFMAVLGPSGSGKSTLLNAVAGRLHGSNL-TGKILINDGKITKQTLKR 141 

Qy 147 CVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVI7VELRLRQCADTRVG 206 

I I : I I : I I I I I I I I : I : I I I I : : : : : I I I : I I I : I : I I I 
Db 142 -TGFVAQDDLLYPHLTVRETLVFVALLRLPRSLTRDVKIRAAESVISELGLTKCENTWG 200 

Qy 2 07 NMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKG-NRLV 265 

I : : II : I II I I : I I I I : I I II : I : I I I I I I I I I : I I I : I I : II I • I 
Db 201 NTFIRGISGGERKRVSIAHELLINPSLLVLDEPTSGLDATAALRLVQTLAGLAHGKGKTV 260 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLT 325 

: I : I I I I : I : : I I I I I : : I : : : I : : I I : : I : I I I I I : I I 

Db 261 VTSIHQPSSRVFQMFDTVLLLSEGKCLFVGKGRDAMAYFESVGFSPAFPMNPADFLLDLA 320 

Qy 32 6 SIDRRSREQELATREK AQSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLD 382 

: : : I I I I : I : II : I I I : I I I I 

Db 321 — NGVCQTDGVTEREKPNVRQTLVTAY DTLLAPQVK TCIEVSHFPQD 365 

Qy 383 TNCLPSPTKMPGAVQQFTTLI RRQISNDFRDLPTLLIHGAEAC 425 

I I :: I III I I I I I : : 

Db 366 -NARFVKTRVNGG — GITTCIATWFSQLCILLHRLLKERRHESFD LLRIFQW 415 



Qy 426 LMSMTI GFLYFGHGS IQLS FMDTAALLFMI GAL I P FNVI LDVI S KCYS ERAML YYE 481 

I : | : : : | : I I I I I I I I I : III: I 

Db 416 AAS I LCGLMWW-HS D YR-DVHDRLGLLFFI S I FWGVLP S FNAVFT F PQERAIFTRE 469 

Qy 482 LEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLWFC 541 

I : I I 111:111 : III: I II I : I I I I : : I I 

Db 470 RASGMYTLSSYFMAHVLGSLSMELVLPASFLTFTYWMVYLRPGIVPFLLTLSVLLLYVLA 529 

Qy 542 CRIMALAAAALLPTFHMASFFSNAXYNS FYLAGGFMINLS SLWTVPA WISKVSFLRW 598 

: : I I I : II : I I I I : : I II: I : I I : 

Db 530 SQGLGLALGAAIMDAKKASTIVTVTMLAFVLTGGYYVN KVPSGMVWMKYVSTTFY 584 

Qy 599 CFEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYA IYLIV 647 

I : I : I I : I I :: I I : I I : I 

Db 585 CYRLLVAIQYG SGEEILRMLGCDSKGKQGASAATSAGCRFVEEEV 629 

Qy 64 8 IGLSG GFMVLYYVSLRFIK 666 

II I I : I I I : : I I I I 

Db 630 I GDVGMWT S VGVLFLMFFGYRVLAYLALRRI K 661 



RESULT 12 
Q84TH5 

ID Q84TH5 PRELIMINARY ; PRT; 662 AA. 

AC Q84TH5; 

DT 01-JUN-2003 (TrEMBLrel. 24, Created) 

DT 01-JUN-2003 (TrEMBLrel. 24, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Putative ABC transporter protein. 

GN AT1G71960. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=37 02; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Yamada K. , Chan M.M. , Chang C.H., Dale J.M. , Hsuan V.W., Lee J.M., 

RA Onodera C.S., Quach H.L., Tang C.C., Toriumi M. , Wong C, Wu H.C., 

RA Yu G., Yuan S., Chen H., Cheuk R. , Jones T . , Kim C.J., Nguyen M. , 

RA Palm C.J., Shinn P., Southwick A. , Tripp M.G., Wu T., Davis R.W. , 

RA Ecker J.R., Theologis A.; 

RT "Arabidopsis Open Reading Frame (ORF) Clones."; 

RL Submitted (MAR-2003) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; BT005795; AA064197.1; 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F : ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC TRANSPORTER 2; 1. 



SQ SEQUENCE 662 AA; 72902 MW; AA84BD738D2D4 8 8A CRC64; 



Query Match 20.2%; Score 708; DB 10; Length 662; 

Best Local Similarity 31.2%; Pred. No. 2.2e-46; 

Matches 216; Conservative 107; Mismatches 233; Indels 136; Gaps 23; 

Qy 44 PNTLEVRDLNYQVDLASQVPWFEQLAQFKMPWTSPSCQNSCELGI-QNLSFKVRS 97 

I I I : I : I : I : : : I I I I : I I : I I 

Db 37 PITLKFVDVCYRVKIHGM SNDSCNIKKLLGLKQKPSDETRSTEERT 82 

Qy 98 GQMLAIIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRK 146 

I : : I : : I I I I : : : I I : : I I I I : : I : I I I : | : : : 

Db 83 ILSGVTGMISPGEFMAVLGPSGSGKSTLLNAVAGRLHGSNL-TGKILINDGKITKQTLKR 141 

Qy 147 CVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVG 206 

I I : I I : I I I I I I I I : I : I I I I : : : I I I : I I I : I : I I I 

Db 142 -TGFVAQDDLLYPHLTVRETLVFVALLRLPRSLTRDVKLRA7VESVISELGLTKCENTVVG 200 

Qy 2 07 NMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKTLSRLAKG-NRLV 2 65 

I :: I I : I I I I I : I I I I : I I I I : I : I I I I I I I I I : I I I : I I : I I I : I 
Db 201 NTFIRGISGGERKRVSIAHELLINPSLLVLDEPTSGLDATAALRLVQTLAGLAHGKGKTV 260 

Qy 266 LISLHQPRSDIFRLFDLVLLMTSGTPIYLG7\AQHMVQYFTAIGYPCPRYSNPADFYVDLT 325 

: I : I I I I : I : : I I I II : : I : : : I : : I I : : I : I I I I I : I I 

Db 261 VT S I HQP S S RVFQMFDTVLLLS EGKCLFVGKGRDAMAYFES VGFS PAFPMNPADFLLDLA 320 

Qy 326 SIDRRSREQELATREK AQSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLD 382 

: : : I I I I : I : II : I I I : I I I I 

Db 321 — NGVCQT D GVT E REK PNVRQT LVT AY DTLLAPQVK TCIEVSHFPQD 365 

Qy 383 TNCLPSPTKMPGAVQQFTTLI RRQISNDFRDLPTLLIHGAEAC 425 

I I : : I III I I I I I : : 

Db 366 -NARFVKTRVNGG — GITTCIATWFSQLCI LLHRLLKERRHES FD LLRIFQW 415 

Qy 426 LMSMTIGFLYFGHGSIQLSFMDTAALLFMI GAL I PFNVI LDVI SKCYSERAMLYYE 481 

I : I : : : I : I I I I I I I I I : III: I 

Db 416 AASILCGLMWW-HSDYR-DVHDRLGLLFFISIFWGVLPSFNAVFTF PQERAI FTRE 469 

Qy 482 LEDGLYTTGPYFFTVKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLVWLVVFC 541 

I : I I II I : I I I : III: MINIMI : : I I 

Db 47 0 RASGMYTLSSYFMAHVLGSLSMELVLPASFLTFTYWMVYLRPGIVPFLLTLSVLLLYVLA 529 

Qy 542 CRIMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSLWTVPA WISKVSFLRW 598 

: : I I I : II : I I I I : : I I I : I : I I : 

Db 530 SQGLGLALGAAIMDAKKASTIWWMIAFVXTGGYYW KVP S GMVWMKYVSTT FY 584 

Qy 599 CFEGLMKIQFSRRTYKMPLGNLTIAVSGDKILSAMELDSYPLYA IYLIV 647 

I : I : I I : I I : : I I : I I : I 

Db 585 CYRLLVAIQYG SGEEILRMLGCDSKGKQGASAATSAGCRFVEEEV 629 

Qy 64 8 IGLSG G FMVL YYVS LRFI K 666 

II I I : I I I:: I I I I 

Db 630 IGDVGMWTSVGVLFLMFFGYRVLAYLALRRIK 661 



RESULT 13 
Q9C8W6 



ID Q9C8W6 PRELIMINARY; PRT; 609 AA. 

AC Q9C8W6; 

DT 01-JUN-2001 (TrEMBLrel. 17, Created) 

DT 01-JUN-2001 (TrEMBLrel. 17, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Putative ABC transporter. 

GN F17M19.11. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=cv. Columbia; 

RX MEDLINE-21016719; PubMed=l 11307 12 ; 

RA Theologis A., Ecker J.R., Palm CJ., Federspiel N.A. , Kaul S., 

RA White O., Alonso J., Altafi H., Araujo R. , Bowman C.L., Brooks S.Y., 

RA Buehler E., Chan A., Chao Q. , Chen H., Cheuk R.F., Chin C.W., 

RA Chung M.K., Conn L., Conway A.B., Conway A.R., Creasy T.H., Dewar K. , 

RA Dunn P., Etgu P., Feldblyum T.V., Feng J.-D., Fong B., Fujii C.Y., 

RA Gill J.E., Goldsmith A.D., Haas B., Hansen N.F., Hughes B., Huizar L., 

RA Hunter J.L., Jenkins J., Johnson-Hopson C, Khan S., Khaykin E., 

RA Kim C.J., Koo H.L., Kremenetskaia I., Kurtz D.B., Kwan A., Lam B. , 

RA Langin-Hooper S., Lee A., Lee J.M., Lenz C.A., Li J.H., Li Y.-P., 

RA Lin X., Liu S.X., Liu Z.A. , Luros J.S., Maiti R. , Marziali A., 

RA Militscher J., Miranda M. , Nguyen M. , Nierman W.C, Osborne B.I., 

RA Pai G., Peterson J., Pham P.K., Rizzo M. , Rooney T., Rowley D., 

RA Sakano H., Salzerg S.L., Schwartz J.R., Shinn P., Southwick A.M., 

RA Sun H., Tallon L.J., Tambunga G., Toriumi M.J., Town CD., 

RA Utterback T-, Van Aken S., Vaysberg M. , Vysotskaia V.S., Walker M. , 

RA Wu D., Yu G., Fraser CM. , Venter J.C, Davis R.W.; 

RT "Sequence and analysis of chromosome 1 of the plant Arabidopsis 

RT thaliana."; 

RL Nature 408:816-820(2000). 

DR EMBL; AC021665; AAG52231.1; -. 

DR PIR; E96742; E96742. 

DR GO; GO: 0016020; Ctmembrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO:0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_AT Pa s e . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW ATP-binding. 

SQ SEQUENCE 609 AA; 67007 MW; 65D11A874E5C0B61 CRC64; 

Query Match 20.0%; Score 700; DB 10; Length 609; 

Best Local Similarity 31.8%; Pred. No. 8.2e-46; 

Matches 210; Conservative 103; Mismatches 225; Indels 122; Gaps 22; 

Qy 76 TSPSCQNSCELGI-QNLSFKVRS 

: : I I I I : I I : I I 



GQMLAI I GS SGCGRASLLDVI 118 
I : : I : : I II I : : : I I : : 



Db 



2 SNDSCNIKKLLGLKQKPSDETRSTEERTILSGVTGMISPGEFMAVLGPSGSGKSTLLNAV 61 



Qy 119 TGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRT 178 

I I I I : : I : I I I : I : : : I I : I I : I I I I I I I I : I : I I I I : 

Db 62 AGRLHGSNL-TGKILINDGKITKQTLKR-TGFVAQDDLLYPHLTVRETLVFVALLRLPRS 119 

Qy 179 FSQAQRDKRVEDVIAELRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDE 238 

: : : : I I I : I I I : I : I I I I : : I I : I I I I I : I I I I : I I I I : I : I I I 
Db 120 LTRDVKLRAAESVISELGLTKCENTWGNTFIRGISGGERKRVSIAHELLINPSLLVLDE 179 

Qy 239 PTSGLDSFTAHNLVKTLSRLAKG-NRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAA 297 

I I I I I I : I 11:11:111 : I : I : I I I I : I : : I I I I I : : I : : : I 
Db 180 PTSGLDATAALRLVQTLAGLAHGKGKTWTSIHQPSSRVFQMFDTVLLLSEGKCLFVGKG 239 

Qy 298 QHMVQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREK AQSLAALFLEKVR 354 

: : I I : : I : I I I I I : I I : ' : : III I : I : 
Db 24 0 RDAMAYFESVGFSPAFPMNPADFLLDLA — NGVCQTDGVTEREKPNVRQTLVTAY 292 

Qy 355 DLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTKMPGAVQQFTTLI 403 

II : I I I : I I I I I I : : I III 

Db 293 DTLLAPQVK TCIEVSHFPQD-NARFVKTRVNGG — GITTCIATWFSQLCILL 341 

Qy 404 RRQISNDFRDLPTLLIHGAEACLMSMTIGFLYFGHGSIQLSFMDTAALLFMI — 455 

I I I I I : : | : | : : : | : I I I I I 

Db 342 HRLLKERRHESFD LLRI FQWAAS I LCGLMWW-HSDYR-DVHDRLGLLFFI S I 392 

Qy 456 — GALIPFNVILDVISKCYSERAMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGM 513 

MM: III: I I : II I I I : I I I : 

Db 393 FWGVL P S FNAVFT F PQERAI FTRERASGMYTLSSYFMAHVLGSLSMELVLPASFLT 448 

Qy 514 PTYWLANLRPGLQPFLLHFLLWLVVFCCRIMALAAAALLPTFHMASFFSNALYNSFYLA 573 

III: I I I I : I I I I : : I I : : I I I : II : I I 

Db 44 9 FTYWMVYLRPGIVPFLLTLSVLLLWLASQ 508 

Qy 574 GGFMINLSSLWTVPA WISKVS FLRWCFEGLMKIQFSRRTYKMPLGNLTIAVSGDKIL 630 

I I : : I II: I : II : I : I : I I : I I : : I I 

Db 509 GGYYVN KVP S GMVWMK YVS TT FYC YRLLVAI Q YG SGEEIL 548 

Qy 631 SAMELDSYPLYA IYLIVIGLSG GFMVLYYVSLRFIK 666 

: I I : I I I I I : I I I : : II I I 

Db 549 RMLGCDSKGKQGASAATSAGCRFVEEEVIGDVGiyMTSVGVLFLMFFGYRVLAYLALRRIK 608 



RESULT 14 
Q8T691 

ID Q8T691 PRELIMINARY; PRT; 801 AA. 

AC Q8T691; 

DT 01-JUN-2002 (TrEMBLrel. 21, Created) 

DT 01-JUN-2002 (TrEMBLrel . 21, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel . 25, Last annotation update) 

DE ABC transporter AbcGl . 

GN ABCG1 . 

OS Dictyostelium discoideum (Slime mold) . 

OC Eukaryota; Mycetozoa; Dictyosteliida; Dictyostelium. 

OX NCBI_TaxID=44689; 

RN [1] 

RP SEQUENCE FROM N.A. 



RC ST RAIN- Ax 4 ; 

RA Anjard C, Loomis W.F.; 

RT "Evolution of the ABC transporters of Dictyostelium. " ; 

RL Submitted (FEB-2002) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AF482380; AAL91485.1; -. 

DR GO; GO: 0016020; C: membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Transport. 

SQ SEQUENCE 801 AA; 90052 MW; CCC4F0036CB195A3 CRC64; 

Query Match 19.8%; Score 695.5; DB 5; Length 801; 

Best Local Similarity 27.9%; Pred. No. 2.7e-45; 

Matches 187; Conservative 134; Mismatches 230; Indels 119; Gaps 20 

Qy 88 I QNLS FKVRS GQMLAI I GS S GCGRAS LLDVI TGRGHGGKI K- S GQI WINGQP S S PQLVRK 146 

: I : : : I I : I I : I I I I : : I I I : : I I I I : : : I I I : : I 

Db 139 LTNINGHIESGTIFAIMGPSGAGKTTLLDIL AHRLNINGSGTMYLNGNKSDFNIFKK 195 

Qy 147 CVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAELRLRQCADTRVG 2 06 

: I I : I : I : I II I I I I I I I : : : I I : : : I I : I : I I : I : I I I I II 

Db 196 LCGYVTQSDSLMPSLTVRETLNFYAQLKMPRDVPLKEKLQRVQDIIDEMGLNRCADTLVG 255 

Qy 207 — NMYVRGLSGGERRRVS I GVQLLWNPGI LI LDEPTSGLDS FTAHNLVKTLSRLAKGNRL 2 64 

: : I I : I I I I I I I I : I : : I I I : : : I I I I I I I I I : I : : : I : I I I I 

Db 256 TADNKIRGISGGERRRVTISIELLTGPSVILLDEPTSGLDASTSFYVMSALKKLAKSGRT 315 

Qy 265 VLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDL 324 

: : : : I I I I I : I : : I I : I I : I I I I I : : I I I II I : I I I I I : : I I 
Db 316 IICTIHQPRSNIYDMFDNLLLLGDGNTIYYGKANKALEYFNANGYHCSEKTNPADFFLDL 375 

Qy 325 ■-- TSID 328 

: : : I 

Db 376 INTQVEDQADSDDDDYNDEEEEIGGGGGGSGGGAGGIEDIGISISPTMNGSAVDNIKNNE 435 

Qy 329 -RRSREQELATREKAQSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLD T 383 

:: ::|: :: I ::|: : : I I : :: I :: : I 

Db 436 LKQQQQQQQQQQQSTDGRARRRIKKLTKEEMVILKKEYPNSEQGLRVNETLDNISKENRT 495 

Qy 384 NCLPSPTKMPGAVQQFTTLIRRQISNDFRDLPTLLIHGAEACLMSMTIGFLYF GHG 439 

: I : I : I I : | : I : : : I I : : : | : I : I : I 

Db 496 DFKYEKTRGPNFLTQFSLLLGREVTNAKRHPMAFKVNLIQAIFQGLLCGIVYYQLGLGQS 555 

Qy 440 SIQLSFMDTAALLFMI-GALIP FNVI LDVI S KC YS ERAML Y YELEDGL YTTGP YF 493 

I : I I : I : I I I : I I I I : : I I hi I hi 

Db 556 SVQ S RTGWAFI IMGVS FPAVMSTI HVFPDVI T I FLKDRA SGVYDTLPFF 605 



Qy 494 FAK ILGELPEHCAYIIIYGMPTYWLANLRP GLQPFLLHFLLVWLWFCCR 543 

II I II II: I I : I I II I :: I I 

Db 606 LAKS FMD AC I AVL L PMVT AT I V YWMTNQRVDPFYSAAPFFRFVLMLVLASQTCL 659 

Qy 544 IMALAAAALLPTFHMASFFSNALYNSFYLAGGFMINLSSL — WTVPAWISKVSFLRWCFE 601 

: : :: : I : : : : I : I Mill:: II I : I I I : I 
Db 660 SLGVLISSSVPNVQVGTAVAPLIVILFFLFSGFFINLNDVPGWLV — WFPYISFFRYMIE 717 

Qy 602 GLMKIQFS RRTYKMPLGNLTIAVSGDKILSAM — ELDSY — PLYAIYLIVIGLSGG 653 

: I I : I : I : : : I : : I : : : : I : I I 
Db 718 AAVI NAFKDVH FT CT D S QKI GGVC P VQ YGNNVT ENMGYD I DH FWRNVWI LVL YI I -G 773 

Qy 654 FMVLYYVSLR 663 

I I I : : I : 

Db 774 FRVLTFLVLK 783 



RESULT 15 
Q7TSR8 



ID Q7TSR8 PRELIMINARY; PRT; 652 AA. 

AC Q7TSR8 ; 

DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette sub-family G member 5. 

GN ABCG5 . 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

OX NCBI_TaxID=1009 0; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=I/LnJ; TISSUE=Liver ; 

RA Wittenburg H., Lyons M.A., Li R. , Churchill G.A. , Carey M.C., 

RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AY195872; AAO45093.1; -. 

KW ATP-binding. 

SQ SEQUENCE 652 AA; 73236 MW; 0125FB617DE296B9 CRC64; 



Query Match 19.6%; Score 686.5'; DB 11; Length 652; 

Best Local Similarity 28.6%; Pred. No. le-44; 

Matches 188; Conservative 128; Mismatches 242; Indels 99; Gaps 16 

Qy 45 NTLEVRDLNYQVDLASQV- PWFEQLAQFKMPWTS PSCQNSCELGI -QNLS FKVRSGQMLA 102 

: : I I : : I I : : : I I I I III : I : : : I : I I I : : 

Db 37 HSLGVLHVSYSV— SNRVGPW WNIKSCQQKWDRQILKDVSLYIESGQIMC 84 

Qy 103 IIGSSGCGRASLLDVITGRGHGGKIKSGQIWINGQPSSPQLVRKCVAHVRQHNQLLPNLT 162 

I : I I II I :: I I I I : II I : : : II : I : : I I : I : I I 

Db 85 ILGSSGSGKTTLLDAISGRLRCTGTLEGDVFVNGCELRRDQFQDCFSYVLQSDVFLSSLT 144 

Qy 163 VRET LAF I AQMRL P RT F S Q AQ RD KRVED VI AE L RL RQ CADT RVGNM YVRGL S GGE RRRVS 222 

I I I I I : I : I I : I : I : I I I : I I I II : I : I : I II I I I I I 



Db 145 VRETLRYTAMLALCRS- SADFYNKKVEAVMTELSLSHVADQVIGS YNFGGI S SGERRRVS 203 

Qy 223 I GVQLLWNPGI LI LDEPTSGLDS FTAHNLVKTLSRLAKGNRLVLI SLHQPRSDI FRLFDL 282 

I I I I : I :: : I I I I I : I I I ||: :| |: ||: : | : | : : : : | | | | | : : | : || 
Db 2 04 IAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAE 263 

Qy 283 VLLMTSGTPIYLGAAQHMVQYFTAI GYPCPRYSNPADFYVDLTSIDRRSREQELATREKA 342 

: = : I I : : I : I : : I I I I I I : I I I I I I : I I I I : I : I I I : I : I : : 
Db 2 64 IAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFDFYMDLTSVDTQSREREIETYKRV 323 

Qy 343 QSLAALFLEKVRDLDDFLWKAETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTT 401 

I I : I I I I :::: ::: | :| II II : 

Db 324 QMLESAFKE SDIYHKI-LENIERARYLKTLPT VPFKTKDPPGMFGKLGV 371 

Qy 402 LIRRQISNDFRDLPTLLIHGAEACLMSMTIGF — LYFGHGS I QLS FMDTAALLFMI GALI 459 

I : I I I I : : : : : : I : : | | : : : : : | | | : 
Db 372 LLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDRVGLLYQFVGAT 431 

Qy 460 PFNVI LDVI SKCYSERAMLYYELEDGLYTTGP YFFAKI LGELPEHCAYI I I YGMPTYWLA 519 

I : : I : : : I I : I : I I I I I : I II : I : | | 

Db 432 PYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHALPFSIIATVIFSSVCYWTL 491 

Qy 520 NLRPGLQPFLLHFLLWLWFCCRIMALAAAALLPTFHMASFFSNAL 566 

I I : I : I I I I : | : | 

Db 492 GLYPEVARF GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSI 534 

Qy 567 YNSFYLAGGFMINLSSLWTVPAWISKVS FLRWCFEGLMKIQFSRRTYKMPLGNLT 621 

: : I I : I : : : : | : : | | | : : | | : | | 
Db 535 VALLSISGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEF YGL NFT 587 

Qy 622 IAVSGDKI LSAMELDS YPLYAI YLIVIGLSGGFMVL 657 

I : I * : : | : M : I I : I : : I 

Db 58 8 CGESNTTML NHPMCAITQGVEFIEKTCPGATSRFTANFLILYGFIPALVIL 638 

Search completed: February 27, 2004, 07:15:30 
Job time : 39.3606 sees 



